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LIVER NECROSIS PREDICTIVE GENES 

Inventors: Larry D. Kier, Timothy D. Nolan, Usha Sankar and 
Maher Derbel 

Cross Reference to Other Patent Applications 

[01] This application claims priority to U.S. provisional application 60/369,287 filed 

1 April 2002, which is hereby incorporated by reference in its entirety. 

Reference to a Sequence Listing and Tables 

[02] This application contains a gene sequence listing and 4 tables submitted on 

a compact disc whose file name is "2874-022PCT" created on 1 April 2003 containing 
4 files and is herein incorporated by reference in its entirety. The files are: a) Table 
32.xls, 214KB, b) Table 34.xls, 525KB, c) Table 35.xls, 626KB, and d) Table 36.xls 
576KB, all in Microsoft Excel™. 

[03] The contents of the files contained on the CD-ROM discs submitted with this 

application are hereby incorporated by reference into the specification. 

Background 

[04] This invention is the field of toxicology. More specifically, it relates to toxicity 

predictive genes and the methods of using such genes to predict toxicity. 

[05] Molecular biology and genomics technologies have potential to create 

dramatic advances and improvements for the science of toxicology as for other 
biological sciences. See, for example, MacGregor, et al. Fund. Appl. Tox. 26:156- 
173, 1995; Rodi et al., Tox. Pathology 27:107-110, 1999; Cunningham et al., Ann. 
N.Y. Acad. ScL 919: 52-67, 2000; Pritchard et al., Proc. Natl. Acad. ScL USA 
98:13266-13271, 2001; and Fielden and Zacharewski, Tox. Sciences 60: 6-10, 2001. 
The advantage of these technologies is that they can provide massive amounts of 
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parallel information and that this information concerns processes and events 
occurring at the molecular level. This level of information is in dramatic contrast to 
conventional safety assessment toxicology that, to a large extent, currently relies on 
subjective evaluation (e.g., in-life observations of behavior, observations of gross 
abnormalities at necropsy and histopathological examination of stained tissue slides 
using a microscope). These current methodologies may be largely subjective and in 
some cases such as histopathological evaluation, they require someone with a high 
degree of training, experience and skill to make competent evaluations. 
Furthermore, many of the methodologies require access to organs and tissues that 
necessitates either killing laboratory animals or surgery to obtain tissue specimens. 

[06] Recently, there have been some initial efforts to apply molecular biology and 

genomics technologies to toxicology. Some efforts have involved application of gene 
expression measurements. See, for example, U.S. Patent 6,228,589 and WO 
01/05804. Analysis of the data has yielded interesting observations of gene 
expressions that appear to correlate with some toxic effects or mechanisms. See, for 
example, Mueller et al. Environmental Health Perspectives 106(5): 277-230 (1998). 
However, there has been very little published work in toxicology so far that applies 
rigorous analytical and statistical techniques to the massive amounts of data 
available from genomics technologies. The observations, so far, have tended to be 
phenomenological and focused on individual gene responses rather than determining 
the generally applicable capabilities of patterns of gene expression to predict toxic 
effects (see, for example, studies of gene expression altered by exposure to 
toxicants in Bartosiewicz et al., Environ health Perspectives 109:71-74, 2001; Huang 
et al., Tox. Sciences 63: 196-207, 2001). Even in the larger field of biological 
sciences, these types of analyses are just beginning to be evidenced in the literature 
(e.g., Golub et al. f Science 286: 531-537, 1999). 

[07] U.S. Patent Number 6,228,589 (Brenner) shows a method for assessing the 

toxicity of a compound in a test organism by measuring gene expression profiles of 
selected tissue. 
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[08] Recently some work has been published that attempts to correlate gene 

expression profiles with the mechanism of toxicity of various hepatotoxins. See for 
example, Waring et al. Tox. and Appl. Pharm. 1 75:28-42 (2001). However there has 
been limited success thus far in the attempts to predict toxicity of compounds based 
on the gene expression profiles elicited upon treatment. 

[09] What is needed are genes and predictive models, which are capable of 

predicting toxicity response. 

Summary 

[10] The invention provides toxicity predictive genes and predictive models that 

are useful to predict toxic responses to one or more agents. 

[1 1] One aspect of the present invention provides methods of predicting toxicity in 

an individual to an agent. One method includes the steps of: (a) obtaining a 
biological sample from an individual treated with the agent or treating a biological 
sample obtained from an individual with the agent or treating in vitro cultured cells or 
explants with the agent; (b) obtaining a gene expression profile on one or more of the 
toxicity predictive genes disclosed herein from the biological sample or in vitro 
cultured cells or explants; and (c) using the gene expression profiles from the 
biological sample or cells treated with the agent as a test set and a database of gene 
expression profiles and toxicity classifications as a training set and using toxicity 
predictive genes and a Predictive Model to assay whether the agent will induce liver 
toxicity in the individual or would be predicted to produce liver toxicity following in 
vivo exposure. 

[12] Another aspect of the present invention provides that the predictive model 

utilizes expression profiles from sets of toxicity predictive gene(s) selected from 
Combination 5, infra, wherein the set is one or more toxicity predictive gene(s). In 
other aspects, the predictive model utilizes expression profiles from sets of one or 
more toxicity predictive gene(s) selected from Combination 4, 3, 2, or 1, wherein the 
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set is one or more toxicity predictive gene(s). 

[1 3] Yet another aspect of the present invention provides methods for determining 

the presence or absence of a no-observable effect level (NOEL) of an agent in an 
individual. One method includes the steps of: (a) obtaining biological samples from 
individuals treated with the agent at different dose levels or treating a biological 
sample obtained from, an individual with different dose levels of the agent or treating 
a biological sample obtained from an individual with different dose levels of the agent 
or treating in vitro cultured cells or explants with different dose levels of the agent; (b) 
obtaining gene expression profiles of the samples; and (c) using the gene expression 
profile from the biological samples as a test set and a database of gene expression 
profiles and toxicity classifications as a training set and using toxicity predictive 
genes and a Predictive Model to determine or predict whether and at which dose 
levels the agent will induce toxicity. 

[14] ? Another aspect of the present invention provides that the predictive model 
utilizes sets of toxicity predictive gene(s) selected from Combination 5, wherein the 
set is one or more toxicity predictive gene(s). In other aspects, the predictive model 
utilizes sets of toxicity predictive gene(s) selected from Combination 4, 3, 2 f or 1, 
wherein the set is one or more toxicity predictive gene(s). 

[15] A further aspect of the present invention provides that the predictive genes 

and models may be used with an in vitro system to identify in vitro systems that can 
be used to accurately predict in vivo toxicity and to use the identified in vitro systems 
to accurately predict in vivo toxicity. 

[16] Another aspect of the present invention provides methods of identifying 

toxicity predictive genes. One method includes the steps of: (a) providing a set of 
candidate toxicity predictive genes; (b) evaluating the genes for their predictive 
performance with at least one training and test set of data in a Predictive Model to 
identify genes which are predictive of toxicity; and (c) testing the performance of 
predictive genes for their ability to predict toxicity for different training and test sets of 
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data, for prediction of accurate compared to random classification and prediction of 
test data external to the data used to derive the predictive genes. A further 
embodiment provides the candidate toxicity predictive genes are rat toxicity genes. 

[ 17 1 Yet another aspect of the present invention provides a computer-based 

method for mining genes predictive for toxicity. One method includes the steps of 
collecting expression levels of a plurality of candidate toxicity predictive genes in a 
multiplicity of samples; optionally storing the expression levels as a database on an 
electronic medium; defining a group of samples to be a training set; defining another 
group of samples to be a test set; optionally generating additional training and test 
sets; and selecting a set of genes which are predictive of toxicity based on 
evaluating the training set and the test set in a Predictive Model. 

[18] In another aspect, the invention provides a computer program product for 

predicting toxicity that includes a set of toxicity predictive genes derived from mining 
a database having a plurality of gene expression profiles indicative of toxicity. In a 
further aspect, the set of toxicity predictive genes includes at least one toxicity 
predictive gene from combination 5, 4, 3, 2, or 1 list. 

[19] In another aspect, the invention provides a library of expression profiles of 

toxicity predictive genes produced by the methods disclosed herein. 

[20] In another aspect, the invention provides an integrated system for predicting 

toxicity including equipment capable of measuring gene expression profiles of toxicity 
predictive genes from biological samples exposed to a test agent, operably linked to 
a computer system capable of implementing a predictive model. 

Brief Description of the Drawings 

[21] Figure 1 is a flow diagram illustrating one embodiment of the present 

invention for identification of toxicity predictive genes. 

[22] Figure 2 is a flow diagram illustrating one embodiment of the present 
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invention for evaluating performance of toxicity predictive genes. 

[23] Figure 3 is a flow diagram illustrating one embodiment of the present 

invention for using toxicity predictive genes to predict toxicity. 

[24] Figure 4 is a graph that illustrates one embodiment of the present invention 

showing the percent of overall correct calls as a function of number of predictor 
genes — histopathology correlating genes (Pearson correlation measure) with training 
and test set 3. The percent of overall correct calls is presented as a function of the 
number of predictor genes. The input genes list was a list of 61 genes that correlated 
with histopathology scores using Pearson's correlation measure (r-value >0.45). 
Training and Test Set 3 was used with other model values of 10 nearest neighbors 
and a p-value ratio cutoff of 0.5. An optimum gene number of 9 was observed 
(lowest number of genes giving the highest percent overall calls) for this case. 

[25] Figure 5 is a graph that illustrates K Means and Tree Clustering for Combo 

5, 4, 3, 2 Genes. Cluster patterns are shown for an 8 cluster analysis of predictive 
genes from the Combo 5, 4, 3, and genes that corresponds to one embodiment of 
the invention. The individual genes located in each of the 8 clusters are presented in 
Table 30. 

Brief Description of the Tables 

[26] Table 1 lists compounds, dose levels, pathology and abbreviations in the 

database in accordance with one embodiment of the present invention. 

[27] Table 2 lists distribution of compounds in individual training and test sets for 

24 hour data in accordance with one embodiment of the present invention. 

[28] Table 3 lists genes whose, expression at 24 hour directly correlates with 

necrosis at 72 hour, ranked by Pearson correlation coefficient in accordance with one 
embodiment of the present invention. 
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[29] Table 4 lists genes whose expression at 24 hour inversely correlates with 

necrosis at 72 hour, ranked by Spearman correlation coefficient in accordance with 
one embodiment of the present invention. 

[30] Table 5 lists predictive genes for 24 hour expression data in accordance with 

one embodiment of the present invention. 

[31] Table 6 lists randomly selected gene subsets from 24 hour Combo All gene 

set in accordance with one embodiment of the present invention. 

[32] Table 7 lists randomly selected gene subsets from 24 hour Combos 5, 4, 3 

combined in accordance with one embodiment of the present invention. 

[33] Table 8 lists randomly selected gene subsets from 24 hour all excluding 

predictive genes (/.e., excluding Combo All genes) in accordance with one 
embodiment of the present invention. 

[34] Table 9 lists toxicity individual sample prediction values for 24 hour data 

predictive genes (combined list and subsets) in accordance with one embodiment of 
the present invention. 

[35] Table 10 lists toxicity compound-dose prediction values for 24 hour data 

predictive genes (combined list and subsets) in accordance with one embodiment of 
the present invention. 

[36] Table 1 1 lists toxicity compound prediction values for 24 hour data predictive 

genes (combined list and subsets) in accordance with one embodiment of the 
present invention. 

[37] Table 12 lists individual gene predictions for Combo 5 in accordance with 

one embodiment of the present invention. 

[38] Table 13 lists individual gene predictions for Combo 4 in accordance with 

one embodiment of the present invention. 
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[39] Table 14 lists individual gene predictions for Combo 3 in accordance with 

one embodiment of the present invention. 

[40] Table 15 lists toxicity compound-dose prediction values for 24 hour data with 

random gene subsets in accordance with one embodiment of the present invention. 

[41] Table 16 lists comparison of predictivity for correct toxicity classification and 

random classification using Combo gene sets and random subsets and 24 hour data 
in accordance with one embodiment of the present invention. 

[42] Table 17 lists distribution of compounds in individual training and test sets for 

6 hour data in accordance with one embodiment of the present invention. 

[43] Table 18 lists genes whose expression at 6 hours directly correlates with 

hepatocellular necrosis at 72 hours, ranked by Pearson correlation coefficient in 
accordance with one embodiment of the present invention. 

[44] Table 19 lists genes whose expression at 6 hours inversely correlates with 

necrosis at 72 hours, ranked by Spearman correlation coefficient in accordance with 
one embodiment of the present invention. 

[45] Table 20 lists genes whose expression at 6 hours is predictive of toxicity at 

72 hours in accordance with one embodiment of the present invention. 

[46] Table 21 lists toxicity compound-dose prediction values for 6 hour data 

predictive genes (combined list and subsets) in accordance with one embodiment of 
the present invention. 

[47] Table 22 lists comparison of predictivity for correct toxicity classification and 

random classification using combo gene sets 6 hour data in accordance with one 
embodiment of the present invention. 

[48] Table 23 lists distribution of compounds in individual training and test sets for 

72 hour data in accordance with one embodiment of the present invention. 
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[49] Table 24 lists genes whose expression at 72 hours directly correlates with 

necrosis at 72 hours, ranked by Pearson correlation coefficient in accordance with 
one embodiment of the present invention. 

[50] Table 25 lists genes whose expression at 72 hours inversely correlates with 

necrosis at 72 hours, ranked by Spearman correlation coefficient in accordance with 
one embodiment of the present invention. 

[51] Table 26 lists genes whose expression at 72 hours is predictive of toxicity at 

72 hours in accordance with one embodiment of the present invention. 

[52] Table 27 lists toxicity compound-dose prediction values for 72 hour data 

predictive genes (combined list and subsets) in accordance with one embodiment of 
the present invention. 

[53] Table 28 lists comparison of predictivity for correct toxicity classification and 

random classification using combo gene sets 72 hour data in accordance with one 
embodiment of the present invention. 

[54] Table 29 lists prediction of toxicity for samples external to database in 

accordance with one embodiment of the present invention. 

[55] Table 30 lists K-means cluster analysis of combo 5, 4, 3 and 2 gene set in 

accordance with one embodiment of the present invention. 

[56] Table 31 lists RCT genes (ESTs) predictive for necrosis at 72 hours: best 

homology matches in accordance with one embodiment of the present invention. 

[57] Table 32 lists genes predictive for necrosis, sequences, and accession 

numbers in accordance with one embodiment of the present invention. 

[58] Table 33 lists hepatocellular necrosis predictive genes whose protein 

products are known to be secreted. The genes are from the table listing 
hepatocellular necrosis predictive genes at the three time points 6, 24 and 72 hours. 
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The protein products are easier to access since they are secreted into body fluids 
and are thus more amenable to be quantified. Therefore these proteins can be 
monitored in body fluids of subjects such as humans and toxicity predictions can be 
made.. 

[59] Table 34 lists expression data for the 6 hour timepoint in accordance with 

one embodiment of the present invention. 

[60] Table 35 lists expression data for the 24 hour timepoint in accordance with 

one embodiment of the present invention. 

[61] Table 36 lists expression data for the 72 hour timepoint in accordance with 

one embodiment of the present invention. 

[62] Table 37 lists predictive performance of predictive genes organized by 

occurrence on training/test set lists (combo number) and time point in accordance 
with one embodiment of the present invention. 

[63] Table 38 lists 266 liver toxicity predictive genes organized by time point and 

combo class in accordance with one embodiment of the present invention. 

[64] Table 39 lists Liver Predictive genes that are predictive across all three time 

points in accordance with one embodiment of the present invention. 

[65] Table 40 lists Liver Predictive genes that are most predictive across all three 

time points in accordance with one embodiment of the present invention. 

Detailed Description 

[66] One embodiment of the present invention provides for a method of predicting 

the liver toxicity in an individual to an agent. The method comprises obtaining a 
biological sample from an individual treated with the agent. The expression of one or 
more liver toxicity predictive genes in the sample is measured, wherein the genes are 
selected from a group consisting of partial gene sequences of genes identified as 
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responsive to agents causing liver necrosis. The process generates a test 
expression profile. The test expression profile is used with a set of reference 
expression profiles in a Predictive Model to determine whether the agent will induce 
liver toxicity in the individual. 

[67] Another embodiment of the present invention provides for a method of 

predicting the liver toxicity of an agent. The method comprising using an in vitro 
system which comprises obtaining a biological sample from an in-vitro cultured cells 
or explants treated with the agent. The expression of one or more liver toxicity 
predictive genes in the sample is measured. The genes are selected from a group 
consisting of partial gene sequences of genes identified as responsive to agents 
causing liver necrosis. The process generates a test expression profile. The test 
expression profile is used with a set of reference expression profiles in a Predictive 
Model to determine whether the agent will induce liver toxicity in the individual. 

[68] Yet another embodiment of the present invention provides for a process for 

predicting the liver toxicity in a biological sample from an individual, in-vitro cell 
cultures or explants to an agent via a programmable machine. The process 
comprises obtaining a biological sample treated with the agent. The expression of 
one or more liver toxicity predictive genes in the sample is measured. The genes are 
selected from a group consisting of partial gene sequences of genes identified as 
responsive to agents causing liver necrosis. The steps generate a test expression 
profile. The test expression profile is used with a set of reference expression profiles 
in a Predictive Model to determine whether the agent will induce liver toxicity in the 
individual. 

[69] Still another embodiment of the present invention provides a computer 

program product for enabling a computer to perform Predictive Model analysis for 
liver toxicity on a biological sample from an individual, in-vitro cell cultures or explants 
to an agent. The computer program product comprises software instructions 
enabling the computer to perform predetermined operations, and a computer 
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readable medium embodying the software instructions. The pre-determined 
operations comprise measuring an expression of one or more liver toxicity predictive 
genes in a sample, wherein the genes are selected from a group consisting of partial 
gene sequences of genes identified as responsive to agents causing liver necrosis. 
A test expression profile is thus generated. The test expression profile is used with a 
set of reference expression profiles in a Predictive Mode! to determine whether the 
agent will induce liver toxicity in the individual. 

[70] Yet a further embodiment of the present invention provides a Computer 

system adopted to predict liver toxicity in a biological sample from an individual, in- 
vitro cell cultures, or explants to an agent. The computer system comprising a 
processor and a memory including software instructions adapted to enable the 
computer system to perform operations. The software instructions comprising 
measuring the expression of one or more liver toxicity predictive genes in the sample, 
wherein the genes are selected from the group consisting of partial gene sequences 
of genes identified as responsive to agents causing liver necrosis, thereby generating 
a test expression profile; and using the test expression profile with a set of reference 
expression profiles in a Predictive Model to determine whether the agent will induce 
liver toxicity in the individual. 

[71] A further embodiment of the present invention provides, a computer program 

product for predicting liver toxicity from a test sample expression profile. The 
computer program product comprises an encrypted training data set; encrypted lists 
of genes selected from genes predictive of liver toxicity to be used with the encrypted 
training data set, and a Predictive Model that uses the encrypted training data sets, 
the encrypted lists of genes, and the test sample expression profile to predict the liver 
toxicity of the test sample. 

[72] Another embodiment of the present invention provides a method for mining 

genes predictive for liver toxicity. The method comprises collecting expression levels 
of a plurality of candidate toxicity predictive genes among a multiplicity of samples. A 



12 



WO 03/085083 



PCT/US03/10141 



group of samples are defined as a training set. Another group of samples are 
defined to be a test set. Optionally, additional training and test sets are generated. A 
set of genes which are predictive of liver toxicity are selected based on evaluating the 
training and test sets in a Predictive Model. 

[73] This invention relates to methods of predicting whether an agent or other 

stimulus is capable of inducing toxicity in a recipient organism using predictive 
molecular toxicology analysis. In particular, the invention provides methods of 
predicting toxicity which comprise analyzing gene and/or protein expression profiles 
across a number of toxicity biomarkers disclosed herein for patterns of expression 
that are predictive of toxicity in the recipient organism. This type of toxicity is 
significant as a toxic effect of many chemical agents and is a significant component 
of adverse reactions to pharmaceuticals and drugs (see, for example, Treinen- 
Moslen, M. in Casarett and Doull's Toxicology: The Basic Science of Poisons Sixth 
Edition (CD. Klaasen, ed.) Chapter 13, McGraw-Hill, New York, 2001). The 
invention is based, in part, upon the discovery that modulated transcriptional 
regulation of relatively small sets of certain genes in response to a test agent can 
accurately predict the occurrence of toxicity observed at later time points. 

[74] Provided herein are multiple sets of toxicity biomarkers which are useful in 

the practice of the toxicity prediction methods of the invention. In particular, 
Applicants have identified 266 toxicity biomarkers that demonstrate utility in 
predicting toxicity outcomes. These biomarkers have been thoroughly characterized 
for their predictive performance, individually as well as in various combinations or 
subsets thereof. In addition, various optimized subsets of the toxicity biomarkers of 
the present invention are disclosed. These sets have also been thoroughly 
characterized for predictive performance using the methods of the invention. Among 
the subsets of toxicity genes provided herein are several which demonstrate 
prediction accuracies in the vicinity of 90%. 

[75] The present invention is further described by way of the experimental 
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examples provided herein. These examples demonstrate that small sets of genes 
(i.e., in some instances, as few as 1, 2 or 3 biomarker genes) are used to accurately 
predict toxicity. For example, as further described in the Examples, analysis of 
mRNA expression of only a few genes provides an accurate indication of whether a 
test agent will or will not induce toxicity. 

[76] The predictive capacity of the methods of the invention have been verified by 

comparisons with random classifications, and data derived external to the database 
used to identify toxicity biomarkers. Moreover, the methods of the invention are 
capable of distinguishing between agent dose levels that induce toxicity (typically 
higher doses) and those doses that are non-toxic. This latter feature is an important 
component of meaningful toxicological evaluation. 

[77] The practice of the present invention will employ, unless otherwise indicated, 

conventional techniques of molecular biology (including recombinant techniques), 
microbiology, cell biology, biochemistry, nucleic acid chemistry, and immunology, 
which are well known to those skilled in the art.. Such techniques are explained fully 
in the literature, such as, Molecular Cloning: A Laboratory Manual, second edition 
(Sambrook et al., 1989) and Molecular Cloning: A Laboratory Manual, third edition 
(Sambrook and Russel, 2001), (jointly referred to herein as "Sambrook"); Current 
Protocols in Molecular Biology (F.M. Ausubel et ai., eds., 1987, including 
supplements through 2001); PCR: The Polymerase Chain Reaction, (Mullis et al., 
eds., 1994); Harlow and Lane (1988) Antibodies, A Laboratory Manual, Cold Spring 
Harbor Publications, New York; Harlow and Lane (1999) Using Antibodies: A 
Laboratory Manual Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY 
(jointly referred to herein as "Harlow and Lane"), Beaucage et al. eds M Current 
Protocols in Nucleic Acid Chemistry John Wiley & Sons, Inc., New York, 2000) and 
Casarett and Doull's Toxicology The Basic Science of Poisons, C. Klaassen, ed., 6 th 
edition (2001). 

[78] Unless otherwise defined, all terms of art, notations and other scientific 
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terminology used herein are intended to have the meanings commonly understood 
by those of skill in the art to which this invention pertains. In some cases, terms with 
commonly understood meanings are defined herein for clarity and/or for ready 
reference, and the inclusion of such definitions herein should not necessarily be 
construed to represent a substantial difference over what is generally understood in 
the art. The techniques and procedures described or referenced herein are generally 
well understood and commonly employed using conventional methodology by those 
skilled in the art, such as, for example, the widely utilized molecular cloning 
methodologies described in Sambrook et al. t Molecular Cloning: A Laboratory 
Manual 2 nd edition (1989) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 
N.Y, As appropriate, procedures involving the use of commercially available kits and 
reagents are generally carried out in accordance with manufacturer defined protocols 
and/or parameters unless otherwise noted. 

[79] "Toxic" or "toxicity" refers to the result of an agent causing adverse effects, 

usually by a xenobiotic agent administered at a sufficiently high dose level to cause 
the adverse effects. 

[80] As used herein, the terms " toxicity biomarker" and " toxicity predictive gene" 

are used interchangeably and refer to a gene whose expression, measured at the 
RNA or protein level can predict the likelihood of a toxicity response with accuracy 
significantly better than would occur by chance, toxicity response can be necrosis or 
any other toxicity manifestations that elicit similar detectable gene expression 
changes. These could include, but are not limited to, other forms of pathology such 
as centrilobular hepatocellular vacuolar degeneration, apoptosis, inflammation and 
cirrhosis. 

[81] A "toxicologicai response" or "toxicity response" refers to a cellular, tissue, 

organ or system level response to exposure to an agent. At the molecular level, this 
can include, but is not limited to, the differential expression of genes encompassing 
both the up- and down-regulation of expression of such genes at the RNA and/or 
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protein level; the up- or down-regulation of expression of genes which encode 
proteins associated with response to and mitigation of damage, the repair or 
regulation of cell damage; or changes in gene expression due to changes in 
populations of cells in the tissue or organ affected in response to toxic damage. 

[82] An "agent" or "compound" is any element to which an individual can be 

exposed and can include, without limitation, drugs, pharmaceutical compounds, 
household chemicals, industrial chemicals, environmental chemicals, other 
chemicals, and physical elements such as electromagnetic radiation. 

[83] The term "biological sample" as used herein refers to substances obtained 

from an individual. The samples may comprise cells, tissue, parts of tissues, organs, 
parts of organs, or fluids (e.g., blood, urine or serum). Biological samples include, 
but are not limited to, those of eukaryotic, mammalian or human origin. 

[84] "Sample" is defined for the purposes of prediction as a biological sample and 

the gene expression data for that sample. Each sample may come from an individual 
animal. A toxicity classification may also be associated with the sample. 

[85] "Gene expression" as used herein refers to the levels of expression and/or 

pattern of expression of a gene. 

[86] "Gene expression profile" refers to the levels of expression of multiple 

different genes measured for the same sample. Gene expression profiles may be 
measured in a sample, such as samples comprising a variety of cell types, different 
tissues, different organs, or fluids (e.g., blood, urine, spinal fluid, sweat, saliva or 
serum) by various methods including but not limited to microarray technologies and 
quantitative and semi-quantitative RT-PCR (e.g., Taqman™) techniques, as well as 
techniques for measuring expression of proteins. 

[87] "Individual" refers to a vertebrate, including, but not limited to, a human, non- 

human primate, mouse, hamster, guinea pig, rabbit, cattle sheep, pig, chicken, and 
dog. 
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[88] As used herein, the terms "hybridize", "hybridizing", "hybridizes" and the like, 

used in the context of polynucleotides, are meant to refer to conventional 
hybridization conditions, such as hybridization in 50% formamide/6X SSC/0.1% 
SDS/100 (ig/ml ssDNA, in which temperatures for hybridization are above 37 
degrees Celsius and temperatures for washing in 0.1X SSC/0.1% SDS are above 55 
degrees Celsius, and preferably to stringent hybridization conditions. The 
hybridization of nucleic acids can depend upon various factors such as their degree 
of complementarity as well as the stringency of the hybridization reaction conditions. 
Stringent conditions can be used to identify nucleic acid duplexes with a high degree 
of complementarity. Means for adjusting the stringency of a hybridization reaction 
are well known to those of skill in the art. See, for example, Sambrook, et a/., 
"Molecular Cloning: A Laboratory Manual," Second Edition, Cold Spring Harbor 
Laboratory Press, 1989; Ausubel, et a/., "Current Protocols In Molecular Biology," 
John Wiley & Sons, 1996 and periodic updates; and Hames et a/., "Nucleic Acid 
Hybridization: A Practical Approach," IRL Press, Ltd., 1985. In general, conditions 
that increase stringency {i.e., select for the formation of more closely matched 
duplexes) include higher temperature, lower ionic strength and presence or absence 
of solvents; lower stringency is favored by lower temperature, higher ionic strength, 
and lower or higher concentrations of solvents. 

[89] In the context of amino acid sequence comparisons, the term "identity" is 

used to express the percentage of amino acid residues at the same relative position 
that are the same. Also in this context, the term "homology" is used to express the 
percentage of amino acid residues at the same relative positions which are either 
identical or are similar, using the conserved amino acid criteria of BLAST analysis, as 
is generally understood in the art. Further details regarding amino acid substitutions, 
which are considered conservative under such criteria, are provided below. 

[90] The toxicity biomarkers described herein were initially identified utilizing a 

database generated from large numbers of in vivo experiments, wherein the 
differential expression of approximately 700 rat genes, measured at various time 
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points, in response to multiple toxic compounds inducing various specific toxic 
responses, as visualized through microscopic histopathological analysis, was 
quantified, as described in pending United States Patent Application filed January 29, 
2002 (serial number 10/060,893). This quantitative gene expression data, as well as 
corresponding histopathological information, were then subjected to an analytical 
approach specifically designed to identify genes which not only correlated with the 
observed histopathology, but also demonstrated an ability to be used in a model 
capable of accurately predicting the occurrence of the toxic response associated with 
the observed histopathology. A detailed description of an identification process is 
presented in the Examples. 

A flow diagram illustrating how the toxicity biomarkers of the invention were 
identified is illustrated in Figure 1 .In addition to the database described and utilized 
herein, other toxicology gene expression databases may be generated, and used to 
identify additional toxicity biomarkers, which may also be employed in the practice of 
the toxicity prediction methods of the invention. Such databases may be generated 
with test compounds capable of inducing various pathologies indicative of a toxic 
response in the and/or other organs or systems, over different time periods and 
under different administration and/or dosing conditions, including without limitation 
necrosis, centrilobular hepatocellular vacuolar degeneration, apoptosis, inflammation 
and cirrhosis. An example of compounds, dose levels, toxicity classifications and 
histopathology scores used in the Examples that follow is provided in Table 1 . Such 
databases may be generated using organisms other than the rat, including without 
limitation, animals of canine, murine, or non-human primate species. In addition, 
such databases may incorporate data derived from human clinical trials and post- 
approval human clinical experiences. Various methods for detecting and quantitating 
the expression of genes and/or proteins in response to toxic stimuli may be employed 
in the generation of such databases, as are generally known in the art. For example, 
microarrays comprising multiple cDNAs or oligonucleotide probes capable of 
hybridizing to corresponding transcripts of genes of interest may be used to generate 
gene expression profiles. Additionally, a number of other methods for detecting and 
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quantitating the expression of gene transcripts are known in the art and may be 
employed, including without limitation, RT-PCR techniques such as TaqMan®, 
RNAse protection, branched chain, etc. 

[92] Databases comprising quantitative gene expression information preferably 

include qualitative and quantitative and/or semi-quantitative information respecting 
the observed toxicological responses and other conventional toxicology endpoints, 
such as for example, body and organ weights, serum chemistry and histopathology 
observations, histopathology scores and/or similar parameters. 

[93] For the purpose of identifying candidate predictive genes, the database 

preferably includes histopathology scores for each animal that has been exposed to 
one or more agent(s). These scores can be assigned based on actual 
histopathology observations for the tissue and animal or on the basis of effects 
observed for other animals treated with the same agent and dose level. The scores 
are numerical scores that reflect the occurrence and severity of histopathological 
changes. These scores can be adjusted to have similar range to gene expression 
changes. For example, a score of 1 could be assigned to samples with no changes 
. and scores of 2-8 assigned to increasingly severe changes. Because the scores are 
numerical, they are suitable for use with a variety of statistical correlation and 
similarity measures. 

[94] An example of a histopathology scoring system is provided in Example 1 . 

Referring to Figure 1 , histopathology scores may be utilized to identify genes which 
correlate with the observed toxicological response, using any number of statistical 
correlation and similarity analysis techniques, including without limitation those 
correlation or similarity measures described or employed in Example 1 (e.g., 
Pearson, Spearman, change, smooth, distance etc.). Such correlating genes may be 
used as predictive gene candidates. Examples of genes whose expression at 24 
hours after treatment correlates with histopathology observed at 72h are detailed in 
Tables 3 and 4. In one embodiment, the correlating gene lists as well as the entire 
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array gene list are used as input gene lists in the GeneSpring™ Predictive Model 
(otherwise known hereafter as "Predictive Model"). 

[95] Statistical analysis of the database of gene expression profiles can be 

effected by utilizing commercially available software programs. In one embodiment, 
GeneSpring™ (Version 4.1, Silicon Genetics, Redwood City, CA) is used. Other 
software programs that can be used for statistical analysis are SAS software 
packages (SAS Institute Inc., Gary, NC) and S-PLUS® software (Insightful 
Corporation, Seattle, WA). 

[96] Using GeneSpring™ software, class predictions can be made from the genes 

in the database, as detailed in Example 1, using one or more training and test sets. 
In one embodiment, five training sets and five test sets are obtained, as shown in 
Example 1 (Table 2). Toxicological classifications are entered for the samples in 
each training and test set. Toxicological classifications can be defined by various 
pathologies. In one embodiment, the toxicity is defined as necrosis observed 72 
hours after treatment with an agent. However, toxicity can manifest in other 
pathologies such as centrilobular hepatocellular vacuolar degeneration, apoptosis, 
inflammation and cirrhosis. 

[97] Once the training sets have been selected, then predicted classifications of 

the test set samples are obtained by using k-nearest neighbor (or knn) voting 
procedure. The class in which each of the knn is determined and the test sample is 
assigned to the class with the largest representation after adjusting for the proportion 
of classifications in the training set. In one embodiment, adjustments are made to 
account for different proportions of classes in the training set. 

[98] Toxicity can also be observed at various time points after exposure to an 

agent and is not limited to only 72 hour after treatment. A skilled toxicologist can 
determine the optimal time after exposure to an agent to observe pathology by either 
what has been disclosed in the art or a stepwise experimentation with time 
increments, for example 2, 4, 6, 12, 18, 24, 36, 48 hours post-exposure or even 
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longer time increments, for example, days, weeks, or months after exposure to the 
agent. 

[99] Figure 1 describes the overall process used to identify toxicity predictive 

genes. In one embodiment, this process was run independently for each time point. 

[100] The number of genes that are to be used in the Predictive Model can be 

varied, for example 50, 40, 30, 20, 10, 5, 2, or 1 gene(s) can be used. In a preferred 
embodiment, at least 50 genes are used, 

[101] An optimal gene list is generated that generates the best predictive accuracy 

with the lowest number of genes used. Figure 2 shows an exemplary profile for an 
optimal gene list. 

[102] Another embodiment of the present invention provides optimum gene lists for 

all input gene lists are combined for each training and test set and then these 
combined lists for six training and test sets are merged to create an aggregate list of 
predictive genes. The aggregate list can then be subdivided to smaller lists of genes 
based on the number of times that the genes occurred on the predictive gene lists for 
an individual training or test set. These are designated herein as Combo 5, 4, 3, 2, or 
1 lists. The genes that were predictive in 5 training and test sets are designated as 
Combo 5 and the genes that were predictive in 4 of 5 training and test sets are 
designated as Combo 4 and so forth. Table 32 presents gene names, accession 
numbers and sequence information for the toxicity predictive genes found by analysis 
of the database in the manner described above. Each of these genes has been 
demonstrated to contribute to predictive performance for at least one input gene list 
and training/test set and one time point. Table 38 lists the toxicity predictive genes 
organized by time point and Combo Class. Table 31 lists homologous genes for the 
RCT sequences that were identified by BLAST search using the GenBank NR 
database as the target database. 

[103] The predictive genes can also be categorized by their occurrence as 
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predictive at different time points. Table 39 lists genes that are on the combined 
predictive lists of three time points tested. This list is derived from the list of the 
predictive genes measured at 6, 24 and 72 hours that predicted necrosis at 72 hours. 
Genes that are predictive at multiple time points can be further grouped by their 
Combo ranking. Table 40 lists genes that are the most predictive across the three 
time points tested. This list is a subset of the list of 9 genes that are predictive across 
three time points 6, 24 and 72 hours. The criteria for inclusion in this table were that 
the gene be a member of the highest combinations, viz., combinations 5 or 4 in at 
least 2 out of three time points. The gene expression data of the genes in Table 40 
could be expected to be very highly predictive of necrosis. Further, since the 
predictive strength of these genes is very high across the 3 time points tested, it 
could be expected that gene expression data derived from these genes even at time 
points not tested such as any time points falling between 6 and 72 hours or any other 
time point would be very highly predictive of necrosis. These specific genes could be 
useful in cases where the dose route or pharmacokinetic properties of a compound 
may alter the kinetics of predictive gene expression changes. 

[104] The predictive genes are evaluated for predictive performance as illustrated 

in Figure 2. For each gene list prediction, a table of data is generated using the 
Predictive Model which includes: the test set containing information about the actual 
call (i.e., "yes" or "no" for toxicity), the predicted call {i.e., "yes" or "no" for toxicity), 
and the P-value cutoff ratio. Expression data that can be used with the K-nearest 
neighbor model and predictive genes to enable one skilled in the art to make 
predictions are given in Tables 34-36. 

[105] The combined list of predictive genes or alternatively, Combo 5, 4, 3, 2, or 1 

list or subsets thereof is used as input into the Predictive Model. As an external 
verification of the predictive abilities of the genes found to be predictive for toxicity, 
random lists of genes may be generated and also used as input into the Predictive 
Model. Example 2 describes the evaluation of the predictive performance of the 
toxicity predictive genes. 
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[106] Predictive performance may also be assessed using data from different time 

points after exposure to the agent. In one embodiment, 24 hour expression data is 
used. In another embodiment, 6 hour expression data is used, as described in 
Examples 3 and 4. In another embodiment, 72 hour expression data is used, as 
described in Example 5 and 6. As shown in Table 37, predictive capability for 24 
hour expression data has a high accuracy rate (i.e., 92% accuracy) when the entire 
predictive gene list is used. 

[107] Somewhat lower predictive accuracies were observed for the 6h and 72h 

data but the prediction was still quite significant. All of the combo lists as well as 
Combo All list had significantly higher accuracy than using random classifications. 

[108] Predictive performance may also be assessed using subsets of genes from 

the different Combo lists. As indicated in Examples 2, 4 and 6 randomly selected 
subsets of the Combo gene lists had very good predictive performance (accuracy 
better than 80% and approaching 90% and even individual genes had mean 
predictive accuracies that were significant (for example, greater than 80%). In one 
embodiment, using 5 genes from Combo All yields about 89% accuracy. Using 
different Combo lists may require a greater number of genes to reach the same 
accuracy level. 

[109] The toxicity predictive genes disclosed herein and toxicity predictive genes 

identified by using methods disclosed herein are useful for predicting toxicity in 
response to exposure to one or more agents. 

[110] The discovery that relatively small sets of different genes have predictive 

value permits flexible applications. The choice of how many and which genes to use 
can be tailored to a variety of different purposes. Very good predictivity is observed 
for sets of a few genes (for example, 24 hour Combo 5 which has only 3 genes had a 
mean prediction accuracy of about 90%). These small sets may be particularly 
advantageous in applications where measurement of only a few RNA species has 
considerable advantages in terms of sample processing logistics, speed and cost. 
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These applications would include relatively high throughput screens for predictive 
capability. An example of this would be an early screen using small samples of 
primary cells or cultured cell lines that can be processed with automated robotic 
equipment for treatment and isolation of RNA followed by efficient technologies for 
measuring expression of a few RNA species such as branched chain technology or 
RT-PCR. 

[111] The use of larger numbers of predictive genes provides redundancy that may 

improve accuracy and precision. Applications using larger numbers of predictive 
genes might be tests of candidates at later stages of commercial development. An 
example would be later stages of preclinical development of a therapeutic candidate 
where in vivo samples can be obtained and more comprehensive methods such as 
microarray measurement of gene expression are appropriate. The larger gene sets 
can also include different subsets of genes which may offer more insight into 
potential mechanisms of toxicity and the ability to have refined predictions of long 
term toxic consequences such as chronic, irreversible toxicity or carcinogenicity. 

[112] Some members of the toxicity predictive genes may also be suitable for 

prediction of toxicity in other organs or may be preferable for predicting toxicity for 
wider ranges of timepoints or treatment routes or regimens. As an example of the 
latter, some of the predictive genes are observed at three different timepoints after 
treatment. These genes may be useful for prediction in cases where the samples 
come from treatment protocols that have different measurement timepoints or routes 
of administration than those employed for the database used in the discovery of the 
predictive genes disclosed herein or where the toxicokinetics for a particular agent 
are known or suspected to be different from those in the database. 

[113] In one embodiment, the agent is an agent for which no expression profile has 

been assessed or stored in the database or library. An animal, e.g., rat, is dosed with 
such an agent and the gene expression profile(s) is the test set for the Predictive 
Model. The training set which is used in the Predictive Model in this case can be the 
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entire database of sample array data because the test set data is not present in the 
database. As described in Example 8, the prediction can be made with accuracy 
without the use of histopathology scores as part of the input into the Predictive 
Model. 

[114] In another embodiment the agent is an agent present in the database but is 

used at a different dose level or with a different treatment protocol than used in the 
database. The training set which is used in the Predictive Model in this case can be 
the entire database of sample array data because the test set data is not present in 
the database. As described in Example 8, the prediction can be made with accuracy 
without the use of histopathology scores as part of the input into the Predictive 
Model. 

[115] In another embodiment, the exposure time of the agent is not 6, 24, or 72 

hours or repeat dosing protocols are used. In this case, the skilled artisan can use 
the predictive toxicity genes from surrounding time points to extrapolate the predicted 
toxicity without undue experimentation. For example, if the individual has been 
exposed to the agent for 12 hours, then predictive genes from 6 and 24 hours 
timepoints are used as guidelines for extrapolating toxicity predictions. 

[116] In another embodiment, the toxicity predictive genes and a predictive model 

can be used to determine the presence or absence of a no-observed toxicity effect 
level. An agent can be used at different treatment levels and expression profiles 
obtained for each treatment level. The predictive genes and predictive model can be 
used to determine which dose levels elicit a response that is predicted to be toxic and 
which dose levels are not toxic. In contrast to conventional endpoints for determining 
no-effect levels, the use of expression data, predictive genes and predictive models 
applies a number of quantitative endpoints and criteria instead of subjective 
endpoints and criteria. This permits more rigorous and precisely defined 
determination of no effect levels. 

[117] In another embodiment, the toxicity predictive genes can be used to detect 
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toxic effects that may be manifested as long lasting or chronic consequences such as 
irreversible toxicity or carcinogenesis. The predictive genes and model can be 
applied to databases where classifications of training and test set samples are made 
with respect to actual or putative endpoints such as irreversible toxicity or 
carcinogenicity. 

* 

[118] In another embodiment, the predictive genes can be used in a variety of 

alternative models to predict toxicity. Some of these models do not require the direct 
use of data in a database but use functions or coefficients derived from the database. 
In another embodiment, the predictive genes and models may be used to evaluate in 
vitro systems for their ability to reflect in vivo toxic events and to use such in vitro 
systems for predicting in vivo toxicity. Expression profiles for predictive genes can be 
created from candidate in vitro assays using treatments with agents of known in vivo 
toxicity and for which in vivo data on gene expression are available. The expression 
data and predictive models of this invention can be used to determine whether the in 
vitro assay system has predictive gene expression responses that accurately reflect 
the in vivo situation. Large sets of predictive genes as described in this invention can 
be tested in such models for their suitability and performance with the candidate in 
vitro systems. This is a superior and novel tool for evaluating and optimizing in vitro 
systems for their ability to reflect and accurately predict in vivo responses. 

[119] In another embodiment, the predictive genes and models may be used with 

an in vitro system to accurately predict in vivo toxicity. In vitro systems that have 
been evaluated and optimized as described in the previous embodiment are treated 
with test agents and expression profiles are measured for predictive genes. The 
expression profiles are used in conjunction with a predictive model to predict in vivo 
toxicity. In this embodiment, there can be considerable reduction in the use of 
laboratory animals. Additionally the application of this embodiment to in vitro human 
systems can provide a unique capability to accurately predict human toxic responses 
without human in vivo exposure or treatment. 
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[120] In another embodiment, measurement of the expression levels of the 

proteins encoded by the predictive genes can be used in conjunction with predictive 
models to predict toxicity. Among the full set of toxicity predictive genes are various 
genes known to encode cell surface, secreted and/or shed proteins. This enables 
the development of methods for predicting toxicity using protein biomarkers. For 
example, as disclosed in Table 33, there are 19 genes in the master predictive set 
which are known to encode secreted proteins. Thus, in another aspect of the present 
invention, toxicity predictive assays that detect the expression of one or more of said 
predictive proteins are developed. Such assays have several advantages, such as: 

[121] Ability to use archived tissue specimens such as preserved or embedded 

tissues that are not suitable for measurement of RNA expression. 

[122] Ability to examine predictive protein expression in tissue slides using in situ 

labeling and microscopic observation. This is useful for detecting predictive toxicity 
signals occurring in very small sub-populations of cells. 

[123] Ability to detect protein markers in specimens that can be readily obtained 

with little or no invasiveness (e.g., blood, urine, sweat, saliva). 

[124] Reduction in animal use in laboratory studies such that no sacrifice of 

animals necessary to obtain tissue specimens when toxicity prediction can be made 
with specimens that can be obtained without animal sacrifice or surgery. 

[125] Application for human use where tissue specimens cannot be obtained or are 

only obtained with great difficulty. 

[126] In another embodiment, the identified predictive genes can be considered as 

potential therapeutic targets when the genes are involved in toxic damage or repair 
responses whose expression or functional modification may attenuate, ameliorate or 
eliminate disease conditions or adverse symptoms of disease conditions. 

[127] In another embodiment the predictive genes can be organized into clusters of 
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genes that exhibit similar patterns of expression by a variety of statistical procedures 
commonly used to identify such coordinately expression patterns. Common 
functional properties of these clustered genes can be used to provide insight into the 
functional relationship of the response of these genes to toxic effects. Common 
genetic properties of these genes (e.g., common regulatory sequences) may provide 
insight into function aspects by revealing known or novel similarities in the coding 
region of the genes. The presence of common known or novel signal transduction 
systems that regulate expression of the genes can also lead to insight as to the 
functional properties of the genes. The presence of common known or novel 
regulatory sequences in the identified predictive genes can also be used to identify 
toxicity predictive genes that are not present in the current Rat CT array. This can be 
accomplished by someone skilled in the art who can analyze sequence databases for 
common regulatory sequences. 

[128] In yet another embodiment, the toxicity predictive genes can be used to 

predict toxicity responses in other species, for example, human, non-human primate, 
mouse, hamster, guinea pig, hamster, rabbit, cattle, sheep, pig, chicken, and dog. 
Some members of the toxicity predictive genes may also be more suitable for 
prediction of toxicity in species other than the species used to derive the database 
(rat in the case of the examples provided). One method for identification of such 
genes is that would be available to someone skilled in the art would be to examine 
DNA sequence databases to determine whether orthologous sequences to the 
predictive genes exist in the target species and how close the orthologous 
sequences are to the predictive gene sequences. One of skill in the art can examine 
the orthologous sequences for similarity in amino acid coding regions and motifs as 
well as for similarities in regulatory regions and motifs of the gene. 

[129] In another embodiment, necrosis predictive genes or gene sequences are 

used for screening other potential toxicity predictive genes or gene sequences in 
other species or even within the same species using methods known in the art. See, 
.for example, Sambrook supra. Gene sequences that hybridize under stringent 
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conditions to the toxicity predictive gene sequences disclosed herein may be 
selected as potential toxicity predictive genes. Additionally, genes which 
demonstrate significant homology with the toxicity predictive genes disclosed herein 
(preferably at least about 70%) may be selected as toxicity predictive gene 
candidates. It is understood that conservative substitutions of amino acids are 
possible for gene sequences that have some percentage homology with the necrosis 
predictive gene sequences of this invention. A conservative substitution in a protein 
is a substitution of one amino acid with an amino acid with similar size and charge. 
Groups of amino acids known normally to be equivalent are: (a) Ala, Ser, Thr, Pro, 
and Gly; (b) Asn, Asp, Glu, and Gin; (c) His, Arg, and Lys; (d) Met, Glu, lie, and Val; 
and (e) Phe, Tyr, and Trp. 

[130] It is understood that the predictive toxicity genes can be used as guides to 

predicting toxicity for agents that have been administered via different routes 
(intraperitoneal, intravenous, oral, dermal, inhalation, mucosal, etc.) from the routes 
that were used to generate the database or to identify the toxicity predictive genes. 
Furthermore, the invention is not intended to be limiting to agents that have been 
administered at different dosages than the agents that were used to generate the 
database or to identify the predictive toxicity genes. 

[131] Data described in the examples were generated using the microarray 

technology disclosed in the Examples. However, the invention is not dependent on 
using this particular platform. Other similar gene expression analysis technologies 
may be incorporated in the practice of this invention. These can include, but are not 
limited to, other arrays containing the predictive genes, RT-PCR (e.g., TaqMan®), 
branched chain technology, RNAse protection or any other method which 
quantitatively detects the expression of RNA polynucleotides. The invention can be 
practiced using these other technologies by generating a database of expression 
measurements for the predictive genes using samples such as those used in the 
database described in Example 1. This database can then be used in a model such 
as the K-nearest neighbor model or can be used to develop any of a number of other 
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models. 

[1 32] The following Examples are provided to illustrate but not to limit the invention 

in any manner. 

.[133] Example 1 

[134] Database of Compounds and Toxicity: Compounds and treatments list used 

to construct a database are given in Table 1. This table also provides evaluation of 
the toxicity observed as necrosis in samples collected 72 hours after treatment. 

[135] Database of Animal Experiments: Sprague Dawley rats Crl:CD from Charles 

River, Raleigh, NC were divided into treated rats that receive a specific concentration 
of the compound (see Table 1) and control rats that only received the vehicle in 
which the compound is mixed (e.g., saline). 

[1 36] At specified timepoints (6h, 24h and 72h) after administration (intraperitoneal 

route) of the compound, a set number of rats (usually 3 control and 3 treated) were 
euthanized and tissues collected. Each rat was heavily sedated with an overdose of 
C0 2 by inhalation and a maximum amount of blood drawn. Exsanguination of the rat 
by this drawing of blood kills the rat. The method of collecting the tissues is very 
important and ensures preserving the quality of the mRNA in the tissues. The body 
of the rat was then opened up and prosectors rapidly removed the tissues (including ) 
and immediately placed them into liquid nitrogen. The organs/tissues were frozen 
within 3 minutes of the death of the animal to ensure that mRNA did not degrade. 
The organs/tissues were then packaged into well-labeled plastic freezer quality bags 
and stored at -80 degrees until needed for isolation of the mRNA from a portion of 
the organ/tissue sample. 

[137] Isolating DNA/RNA from animal tissues or cells: Total RNA was isolated 

from tissue samples using the following materials: Qiagen RNeasy midi kits, 2- 
mercaptoethanol, liquid N2, tissue homogenizer, dry ice Samples were kept on ice 
when specified. 
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[138] If a tissue needed to be broken, then the tissue sample was placed on a 

double layer of aluminum foil which was then placed within a weigh boat containing a 
small amount of liquid nitrogen. The aluminum foil was folded around the tissue and 
then struck by a small foil-wrapped hammer to administer mechanical stress forces, 

[139] About 0.15-0.20 g of tissue was weighed out and placed in a sterile 

container. To preserve integrity of the RNA, tissues were kept on dry ice when other 
samples were being weighed. A RLT (Qiagen®) buffer was added to the sample to 
aid in the homogenization process. The tissue was homogenized using commercially 
available homogenizer ( IKA Ultra Turrax T25 homogenizer) with the 7 mm microfine 
sawtooth shaft and generator (195 mm long with a processing range of 0.25 ml to 20 
ml, item # 372718). After homogenization, samples were stored on ice until samples 
were homogenized. The homogenized tissue sample was spun to remove nuclei 
thus reducing DNA contamination. The supernatant of the lysate was then 
transferred to a clean container containing an equal volume of 70% EtOH in DEPC 
treated H 2 0 and mixed. RNA was isolated by putting the supernatant through an 
RNeasy spin column, washed, and subsequently eluted. Small quantities of 
remaining DNA were removed by use of DNase enzyme during the RNA isolation 
procedure following the instructions provided by Qiagen and alternatively by lithium 
chloride (LiCI) precipitation following the RNA isolation. The isolated RNA pellet was 
, stored in Rnase-free water or in an RNA storage buffer (10 mM sodium citrate), 
Ambion Cat #7000. The RNA amount was then quantitated using a 
spectrophotometer. 

[140] Rat 700 CT chip: Gene expression data was generated from a microarray 

chip that has a set of toxicologically relevant rat genes that are used to predict 
toxicological responses. The rat 700 CT gene array is disclosed in pending U.S. 
applications 60/264,933; 60/308,161; and pending application filed on January 29, 
2002 (serial number 10/060,893). 

[141] Microarray RT reaction: Fluorescence-labeled first strand cDNA probe was 
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made from the total RNA or mRNA isolated from s of control and treated rats. This 
probe was hybridized to microarray slides spotted with DNA specific for 
toxicologically relevant genes. The materials needed are: total or messenger RNA, 
primer, Superscript II buffer, dithiothreitol (DTT), nucleotide mix, Cy3 or Cy5 dye, 
Superscript II (RT), ammonium acetate, 70% EtOH, PCR machine, and ice. 

[142] The volume of each sample that would contain 20pg of total RNA (or 2pg of 

mRNA) was calculated. The amount of DEPC water needed to bring the total volume 
of each RNA sample to 14 pi was also calculated. If RNA was too dilute, the 
samples were concentrated to a volume of less than 14 pi in a speedvac without 
heat. The speedvac must be capable of generating a vacuum of 0 Milli-Torr so that 
samples can freeze dry under these conditions. Sufficient volume of DEPC water 
was added to bring the total volume of each RNA sample to 14 pi. Each PCR tube 
was labeled with the name of the sample or control reaction. The appropriate volume 
of DEPC water and 8 pi of anchored oligo dT mix (stored at -20°C) was added to 
each tube. 

[143] Then the appropriate volume of each RNA sample was added to the labeled 

PCR tube. The samples were mixed by pipeting. The tubes were kept on ice until 
samples are ready for the next step. It is preferable for the tubes to kept on ice until 
the next step is ready to proceed. The samples were incubated in a PCR machine 
for 10 minutes at 70°C followed by 4°C incubation period until the sample tubes were 
ready to be retrieved. The sample tubes were left at 4°C for at least 2 minutes. 

[144] The Cy dyes are light sensitive, so any solutions or samples containing Cy- 

dyes should be kept out of light as much as possible (e.g., cover with foil) after this 
point in the process. Sufficient amounts of Cy3 and Cy5 reverse transcription mix 
were prepared for one to two more reactions than would actually be run by scaling up 
the following: 

[145] For labeling with Cy3 



32 



WO 03/085083 



PCT/US03/10141 



8 ul 5x First Strand Buffer for Superscript II, 4 ul 0. 1 M DTT, 2 ul Nucleotide Mix, 2 ul of 
1:8 dilution of Cy3 (e.g.,, 0.125mM cy3dCTP) and 2 ul Superscript II 

[1 46] For labeling with Cy5 

8 ul 5x First Strand Buffer for Superscript n, 4 ul 0.1 M DTT, 2 ul Nucleotide Mix, 2 ul of 

1:10 dilution of Cy5 (e.g.„ O.lmM CySdCTP) and 2 ul Superscript II 

[147] About 18 Ml of the pink Cy3 mix was added to each treated sample and 18 \A 

of the blue Cy5 mix was added to each control sample. Each sample was mixed by 
pipeting. The samples were placed in a DNA engine (PTC-200 Petier Thermal 
Cycler, MJ Research) for 2 hours at 45°C followed by 4°C until the sample tubes 
were ready to be retrieved. 

[148] In addition to the desired cDNA product, the RT reaction contained impurities 

that must be removed. These impurities included excess primers, nucleotides, and 
dyes. The primary method of removing the impurities was by following the 
instructions in the QIAquick PCR purification kit (Qiagen cat#1 20016). 

[149] Alternatively, the RT reactions were cleaned of impurities by ethanol 

precipitation and resin bead binding. The samples from DNA engine were 
transferred to Eppendorf tubes containing 600 \x\ of ethanol precipitation mixture and 
placed in -80°C freezer for at least 20-30 minutes. These samples were centrifuged 
for 15 minutes at 20800 x g (14000 rpm in Eppendorf model 541 7C) and carefully the 
supernatant was decanted. A visible pellet was seen (pink/red for Cy3, blue for Cy5). 
Ice cold 70% EtOH (about 1 ml per tube) was used to wash the tubes and the tubes 
were subsequently inverted to clean tube and pellet. The tubes were centrifuged for 
10 minutes at 20800 x g (14000 rpm in Eppendorf model 5417C), then the 
supernatant was carefully decanted. The tubes were air dried for about 5 to 10 
minutes, protected from light. When the pellets were dried, they were resuspended 
in 80 ul nanopure water. The cDNA/mRNA hybrid was denatured by heating for 5 
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minutes at 95°C in a heat block and flash spun. Then the lid of a "Millipore MAHV 
N45" 96 well plate was labeled with the appropriate sample numbers. A blue gasket 
and waste plate (v-bottom 96 well) was attached. About 160 jx\ of Wizard DNA 
Binding Resin (Promega cat#A1151) was added to each well of the filter plate that 
was used. Probes were added to the appropriate wells (80 cDNA samples) 
containing the Binding Resin. The reaction is mixed by pipeting up and down -10 
times. The plates were centrifuged at 2500 rpm for 5 minutes (Beckman GS-6 or 
equivalent) and then the filtrate was decanted. About 200 |jJ of 80% isopropanol was 
added, the plates were spun for 5 minutes at 2500 rpm, and the filtrate was 
discarded. Then the 80% isopropanol wash and spin step was repeated. The filter 
plate was placed on a clean collection plate (v-bottom 96 well) and 80 jil of Nanopure 
water, pH 8.0-8.5 was added. The pH was adjusted with NaOH. The filter plate was 
secured to the collection plate and after 5 minutes was centrifuged for 7 minutes at 
2500 rpm. 

Purification of Cy -Dye Labeled cDNA: To purify fluorescence-labeled first 
strand cDNA probes, the following materials were used; Millipore MAHV N45 96 well 
plate, v-bottom 96 well plate (Costar), Wizard DNA binding Resin, wide orifice pipette 
tips for 200 to 300 pi volumes, isopropanol, nanopure water. It is highly preferable to 
keep the plates aligned at times during centrifugation. Misaligned plates lead to 
sample cross contamination and/or sample loss. It is also important that plate 
carriers are seated properly in the centrifuge rotor. 

The lid of a "Millipore MAHV N45" 96 well plate was labeled with the 
appropriate sample numbers. A blue gasket and waste plate (v-bottom 96 well) was 
attached. Wizard DNA Binding Resin (Promega cat#A1151) was shaken 
immediately prior to use for thorough resuspension. About 160 |d of Wizard DNA 
Binding Resin was added to each well of the filter plate that was used. If this was 
done with a multi-channel pipette, wide orifice pipette tips would have been used to 
prevent clogging. It is highly preferable not to touch or puncture the membrane of the 
filter plate with a pipette tip. Probes were added to the appropriate wells (80 jal cDNA 
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samples) containing the Binding Resin. The reaction is mixed by pipeting up and 
down -10 times. It is preferable to use regular, unfiltered pipette tips for this step. 
The plates were centrifuged at 2500 rpm for 5 minutes (Beckman GS-6 or equivalent) 
and then the filtrate was decanted. About 200 yl of 80% isopropanol was added, the 
plates were spun for 5 minutes at 2500 rpm, and the filtrate was discarded. Then the 
80% isopropanol wash and spin step was repeated. The filter plate was placed on a 
clean collection plate (v-bottom 96 well) and 80 jal of Nanopure water, pH 8.0-8.5 was 
added. The pH was adjusted with NaOH. The filter plate was secured to the 
collection plate with tape to ensure that the plate did not slide during the final spin. 
The plate sat for 5 minutes and was centrifuged for 7 minutes at 2500 rpm. 
Replicates of samples should be pooled. 

[152] Dry-down Process: Concentration of the cDNA probes is preferable so that 

they can be resuspended in hybridization buffer at the appropriate volume. The 
volume of the control cDNA (Cy-5) was measured and divided by the number of 
samples to determine the appropriate amount to add to each test cDNA (Cy-3). 
Eppendorf tubes were labeled for each test sample and the appropriate amount of 
control cDNA was allocated into each tube. The test samples (Cy-3) were added to 
the appropriate tubes. These tubes were placed in a speed-vac to dry down, with foil 
covering any windows on the speed vac. At this point, heat (45°C) may be used to 
expedite the drying process. Samples may be saved in dried form at -20°C for up to 
14 days. 

[153] Microarray Hybridization: To hybridize labeled cDNA probes to single 

stranded, covalently bound DNA target genes on glass slide microarrays, the 
following material were used: formamide, SSC, SDS, 2 fjm syringe filter, salmon 
sperm DNA (Sigma, cat # D-7656), human Cot-1 DNA (Life Technologies, cat # 
15279-011), poly A (40 mer: Life Technologies, custom synthesized), yeast tRNA 
(Life Technologies, cat # 15401-04), hybridization chambers, incubator, coverslips, 
parafilm, heat blocks. It is preferable that the array is covered to ensure proper 
hybridization. 
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[154] About 30 pi of hybridization buffer was prepared per cDNA sample (control 

rat cDNA plus treated rat cDNA). Slightly more than is what is needed should be 
made since about 100 jil of the total volume made for hybridizations can be lost 
during filtration. 

[155] Hybridization Buffer: for 100 pi: 

• 50% Formamide 50 pi formamide 

• 5XSSC 25pl20XSSC 

• 0.1% SDS 25 pi 0.4% SDS 

[156] The solution was filtered through 0.2 pm syringe filter, then the volume was 

measured. About 1 pi of salmon sperm DNA (10mg/ml) was added per 100 pi of 
buffer. 

[157] Alternatively, the hybridization buffer was made up as: 

Hybridization Buffer: for 101 pi: 

• 50% Formamide 50 pi formamide 

• 10XSSC 50pl20XSSC 

• 0.2% SDS 1 pi 20% SDS 



[158] The solution was filtered through 0.2 pm syringe filter, then the volume was 

measured. One microliter of salmon sperm DNA (9.7mg/ml), 0.5 pi Human Cot-1 
DNA (5 pg/pl), 0.5 pi poly A (5 pg/pl), 0.25 pi Yeast tRNA (10 pg/pl) was added per 
100 pi of buffer. The hybridization buffers were compared in validation studies and 
there was no change in differential gene expression data between the two buffers. 

[159] Materials used for hybridization were: 2 Eppendorf tube racks, hybridization 

chambers (2 arrays per chamber), slides, coverslips, and parafilm. About 30 pi of 
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nanopure water was added to each hybridization chamber. Slides and coverslips 
were cleaned using N 2 stream. About 30 jil of hybridization buffer was added to 
dried probe and vortexed gently for 5 seconds. The probe remained in the dark for 
10-15 minutes at room temperature and then was gently vortexed for several 
seconds and then was flash spun in the microfuge. The probes were boiled or 
placed in a 95 °C heat block for 5 minutes and centrifuged for 3 min at 20800 x g 
(14000 rpm, Eppendorf model 541 7C). Probes were placed in 70 °C heat block. 
Each probe remained in this heat block until it was ready for hybridization. 

[160] About 25 |aJ was pipeted onto a coverslip. It is highly preferable to avoid the 

material at the bottom of the tube and to avoid generating air bubbles. This may 
mean leaving about 1 \i\ remaining in the pipette tip. The slide was gently lowered, 
face side down, onto the sample so that the coverslip covered that portion of the slide 
containing the array. Slides were placed in a hybridization chamber (2 per chamber). 
The lid of the chamber was wrapped with parafilm and the slides were placed in a 
42°C humidity chamber in a 42°C incubator. It is preferable to not let probes or 
slides sit at room temperature for long periods. The slides were incubated for 18-24 
hours. 

[161] Post-Hybridization Washing: To obtain only single stranded cDNA probes 

tightly bound to the sense strand of target cDNA on the array, non-specifically bound 
cDNA probe should be removed from the array. Removal of non-specifically bound 
cDNA probe was accomplished by washing the array and using the following 
materials: slide holder, glass washing dish, SSC, SDS, and nanopure water. Six 
glass buffer chambers and glass slide holders were set up with 2X SSC buffer heated 
to 30-34°C and used to fill up glass dish to 3/4 m of volume or enough to submerge 
the microarrays. The slides were placed in 2X SSC buffer for 2 to 4 minutes while 
the cover slips fall off. The slides were then moved to 2X SSC, 0.1% SDS and 
soaked for 5 minutes. The slides were transferred into 0.1X SSC and 0.1% SDS for 
5 minutes. Then the slides are transferred to 0.1 X SSC for 5 minutes. The slides, 
still in the slide carrier, were transferred into nanopure water (18 megaohms) for 1 
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second. To dry the slides, the stainless steel slide carriers were placed on micro- 
carrier plates and spun in a centrifuge (Beckman GS-6 or equivalent) for 5 minutes at 
1000 rpm. 

[162] Scanning slides: The washed and dried hybridized slides were scanned on 

Axon Instruments Inc. GenePix 4000A MicroArray Scanner and the fluorescent 
readings from this scanner converted into quantitation files (.gpr) on a computer 
using GenePix software. 

[163] Array Data, Normalization and Transformation: GeneSpring™ software 

(Version 4.1, Silicon Genetics) was used for statistical analyses including 
identification of genes expressions correlating with histopathology scores, K-means 
and tree cluster analysis, and predictive modeling using the K-means nearest 
neighbor (Predict Parameter Values tool). 

[164] Microarray data were loaded into GeneSpring™ software for analysis as 

GenePix files as above. Specific data loaded into GeneSpring™ software included 
gene name, GenBank ID control channel mean fluorescence and signal channel 
mean fluorescence. Expression ratio data (ratio of signal to control fluorescence) 
were normalized using the 50 th percentile of the distribution of genes and control 
channel. Ratio data were excluded from analysis if the control channel value was <0. 
For analysis of correlations and predictive values gene expression ratios were 
transformed as the log of the ratio. 

[165] Correlation with Histopathology Scores: Histopathology scores for each 

animal (assigned on a compound-dose basis as indicated in Table 1) were entered 
with gene expression data by using the GeneSpring™ 'Drawn Gene' function. 
Correlations between the histopathology scores and gene expression were 
conducted with the distance measures listed below: 

standard positive and negative correlation 

smooth positive and negative correlation 
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change positive correlation 

upregulated positive correlation 

Pearson positive and negative correlation 

Spearman positive and negative correlation 

distance positive correlation 

[166] These correlation or similarity measures are standard statistical correlation 

measures that are described in the GeneSpring Advanced Analysis Techniques 
Manual (Release Data March 13, 2001, Silicon Genetics). Where both positive and 
negative correlations were obtained combined positive and negative correlating gene 
lists were also created. 

[167] Class Prediction: The Predict Parameter Values tool in GeneSpring™ 

software was used for toxicity class prediction. The following is a summary of the 
procedure used in the GeneSpring predictive software. This is described in 
GeneSpring Advanced Analysis Techniques Manual (Release Data March 13, 2001, 
Silicon Genetics) with additional information supplied by Silicon Genetics and a 
statistical expert. The prediction tool relies on standard statistical procedures that 
can be implemented in a variety of statistical software packages. 

[168] Gene Selection: Genes to be used for prediction are picked through variable 

selection. This entails taking a single gene and a single class (e.g., toxicity) and 
creating a contingency table. In the table below, columns 1 through N of the table 
each represent one possible cutoff point based on the gene expression level (ratio of 
signal/control) for that class. The number of possible cutoffs is less than or equal to 
the total number of samples for the class (e.g., A). It is possibly less than the total 
number, since there may be ties in gene expression level. Hence, A/, M, and X may 
or may not be distinct. In the example, an n-class problem is illustrated, where x and 
y entries are the class counts at that gene expression cutoff level, for that specific 
gene and class, either above ("a") or below ("b") the cutoff. "Classl" is the set of all 
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samples (above or below) the cutoff for Classl, and "!Class1" are all those not in 
Classl (above or below) the cutoff, and similarly for the other classes. The class 
totals in the training set are the total class marginals used to compute Fisher's exact 
test. 

[169] For a specific gene, and for each 1 class, the best p-value as calculated by 

Fisher's Exact Test for independence between one of the pair of columns (e.g., 1a 
and 1b) and the actual class totals (e.g., A) is used to score the gene (-/n(p) = the 
score) for that class. Thus, there are N (or, M, Q etc.) contingency tables, where the 
best score of the N tables is used for that class and gene. If there is a wide disparity 
between the above and below counts in either the a or b column (this is a two-sided 
Fisher's Exact Test), the smaller the p-value and the higher the score. 

[170] The genes per class are rank ordered by the most discriminating (highest) 

score. The predictivity list is composed of the most discriminating genes per class. 
Namely, genes are combined that best discriminate class 1 with those that best 
discriminate class 2 and so on. The genes are selected in rotation of the highest 
score per class. Duplicate genes are ignored in the rotation and not added to the list, 
the gene with the next highest score is taken. 

[171] The training samples now have only the gene list garnered from the above 

procedure. As an example, where once the training samples may have had an initial 
list of 200 genes per sample, they now have only a subset composed of the gene list, 
say, 50 (the number of predictivity genes specified) that are selected from the initial 
list by the gene selections procedure. Thus, each sample is a vector of 50 
normalized expression ratios. Since the selection of genes is done in rotation, the list 
contains 25 genes for one class, and 25 for the other class. The matrix below 
illustrates the basic features of this gene selection process. 
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above 


below 




above 




Totals 
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Classl 


xl.la 


xl.lb 




xl.Na 


xl.Nb 
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IClassl 
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yl.Nb 
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2 
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xl.2a 


xl.2b 




xl.Ma 
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yl.2b 
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Classn 


xl.na 


xl.nb 




xl.Qa 


xl.Qb 


X 


IClassn 


yl.na 


yl.nb 




yl.Qa 


yl.Qb 


Y 



[172] Classifying the Test Samples: After the genes to be used in the training set 

have been selected, the test set is classified based on the /c-nearest neighbor (knn) 
voting procedure. Using just those genes in the gene list, for each sample in the test 
set of samples, the k nearest neighbors in the training set are found with the 
Euclidean distance. The class in which each of the k nearest neighbors is 
determined, and the test set sample is assigned to the class with the largest 
representation in the k nearest neighbors after adjusting for the proportion of classes 
in the training set. 

[173] For example, in a two-class problem, let there be 30 samples of class 1 and 

60 samples of class 2 in the training set. With k = 9 say it can be determined that 7 
of the nearest neighbors to a sample from the testing set are in class 1 . The sample 
can then be classified as being a member of class 1. If another sample from the test 
set has a total of 4 nearest neighbors in class 1 / after adjusting for the proportion, this 
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sample would be assigned to class 1 rather than class 2, even though the majority 
vote suggests assignation to class 2. 

[174] Decision Threshold: The decision threshold is a mechanism to help clearly 

define the class into which the sample will fall, and can be set to reject classification if 
the voting is very close or tied. (Thus, k can be even for two-class problems without 
worrying about the tie problem.) A p-value is calculated for the proportion of 
neighbors in each class against the proportions found in the training set, again using 
Fisher's exact test, but now a one-sided test. 

[175] For example, let k = 11, if the proportion of neighbors of class 1 in the test 

set is 6/11, and the proportion of class 1 in a 100 sample training set is 0.4, the p- 
value calculated is 0.29 (half the two-sided test). If the proportion in the training set is 
0.1, the p-value is 0.004. The smaller the p-value the greater the likelihood that the 
sample from the testing set belongs to that class. 

[176] A p-value ratio (P-value) is set as a way of setting the level of confidence in 

individual sample predictions based on the ratio of p-values for the best class (lowest 
p-value) versus the second best class (second lowest p-value). For example, if the 
P-value is set at 0.5 and the ratio of p-values for a particular sample is 0.6, then the 
predictive model will not make a call for that sample. 

[177] Training and Test Data Sets: Data were each separated into 5 training and 

test sets by randomly distributing the compounds into the sets. This was 
accomplished by assigning random numbers to lists of compounds that are negative 
and positive for histopathology, sorting by random number, and then dividing the 
sorted lists into a specific number of training and test sets. The training and test set 
assignments are presented in Table 2. 

[1 78] Toxicology Classification: toxicity classifications were entered for training and 

test set as a parameter column. Toxicity, as defined by observation of necrosis in the 
at 72 hours after treatment, was entered as a "yes" or "no" for each animal in a 
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compound-dose group. Additionally, a parameter column for random histopathology 
classification was designated. This was done by randomly assigning the same 
number of "yes" and "no" calls to the individual animals. 

[179] Prediction Output and Initial Data Processing: The "Predict Parameter 

Value" tool of GeneSpring was used with each of the training and test sets to 
generate predictions of histopathology classifications of the test sets. Unless 
otherwise specified a nearest neighbor setting of 10 (default) and P-value ratio cutoff 
of 0.5 was used. The number of genes used to predict was varied with standard 
numbers of 50, 40, 30, 20, 10,5,2 and 1 genes used. For each number of genes the 
numbers of correct calls, incorrect calls and non-calls were recorded. Non-calls are 
cases where no prediction was made because the P-value ratio exceeded the 
specified P-value ratio cutoff. Calculations were made for overall percent correct 
calls (number of correct classifications/number or samples), percent correct calls of 
called samples (number of correct classifications/number of samples with calls) and 
percent of called samples (samples with calls/number of samples). 

[180] For each input list and optimal number of predictive genes (lowest number of 

genes giving a maximum overall percent of correct calls) additional information was 
recorded that included the list of specific genes in the optimum predictive set. 

[181] Results: Expression array data were examined for the existence of genes 

whose expression correlated with histopathology scores. Table 1 presents a list of 
the compounds and dose levels along with the histopathology classification and 
histopathology severity scores used for this analysis. For each distance measure the 
probability was adjusted in increments of 0.05 until at least 50 correlating genes were 
obtained. Lists of correlating genes were obtained using the distance measures 
described in Materials and Methods. Example sets of correlating genes are provided 
in Tables 3 and 4. 

[182] The correlating gene lists as well as the entire array gene list were provided 

as input lists to the GeneSpring Predict Parameter value tool (described in Materials 
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and Methods) that employs a K-means nearest neighbor {knn) predictive model. 
These lists as well as the entire array gene list were used for each of the five training 
and test sets defined in Materials and Methods to generate predictions of 
histopathology classifications of the test sets. Input genes for the Predict Parameter 
Value feature included all 700 genes in the GenePix file (the rat CT Array) which 
were disclosed in a currently pending application (serial number 10/060,893) filed on 
January 29, 2002, as well as smaller lists of genes whose expressions correlated 
with histopathology by the correlation measures described previously. The number 
of genes used to predict are varied with standard numbers of 50, 40, 30, 20, 10, 5, 2 
and 1 genes used. The specified number of predictive genes was varied to obtain an 
optimum number of predictive genes. Figure 4 presents a typical profile for obtaining 
an optimum gene list. 

[183] After this was done for 5 training and test sets, all gene lists were then 

merged to create one aggregate list of predictive genes. Each gene on this 
aggregate list has predictive value for at least one of the training and test sets 
because it was observed to contribute to an optimum predictivity for a specific 
training/test set. The aggregate list was subdivided into smaller lists of genes based 
on the number of times a gene was predictive for an individual training or test set. 
For example, if 5 training and test sets were used, genes that were predictive in 5 
training and test sets were designated as Combo (combination) 5. Genes that were 
predictive in only 4 of 5 training and test sets were designated as Combo 4, etc. A 
list of predictive genes organized by their occurrence in the separate training and test 
sets is presented in Table 5. 

[184] Example 2 

[185] Materials and Methods: The database used was as described in Example 1. 

[186] Array Data, Normalization and Transformation: Array data, normalization 

procedures and transformations used in these analyses are as described in Example 
1 . Table 32 presents 24 hour gene expression data for the predictive genes. These 
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data can be used with a k nearest neighbor prediction model (as available in 
GeneSpring or other statistical software packages) to make predictions as described 
in this example. 

[187] Class Prediction: The Predict Parameter Values tool in GeneSpring™ 

software was used for toxicity class prediction. A description of this tool and the 
statistical procedures used is provided in Example 1. 

[188] Training and Test Data Sets: The training and test data sets used are those 

described in Table 2 of Example 1 . 

[189] Toxicology Classification: toxicity classifications used are described in Table 

1 of Example 1. In this analysis randomized classifications (same number of "yes" 
and "no" classifications distributed randomly among the samples) were also used. 

[190] Prediction Output and Initial Data Processing: For each predicting gene list 

used for evaluation a table of data generated by the Predict Parameter Values tool in 
GeneSpring™ software was saved which provided for each sample in the test set the 
actual call {"yes" or "no" for toxicity), the predicted call ("yes", "no" or no call for 
toxicity) and the P-value cutoff ratio. This set of data was used to calculate predictive 
performance measures provided below. 

[191] Prediction Measures: Measures of prediction used for these analyses are 

generally accepted prediction measures for information about actual and predicted 
classifications done by a classification system (Modern Applied Statistics with S-Plus, 
W. N. and B. D. Ripley, Springer, 1994, 3 rd edition.; Proc. 14 th International 
Conference on Machine Learning, Miroslav Kubat, Stan Matwin, 1997). Results from 
predictions of a two class case can be described as a two-class matrix: 



Actual 




Predicted 




Negative 


Positive 


Negative 


a 


b 
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Predicted 


Positive 


c 


d 



[192] Standard terms used for prediction are: 

[193] Accuracy is the proportion of total number of predictions that are correct = 

a+d/a+b+c+c 

[194] False positive rate is the proportion of negative cases that are incorrectly 

classified as positive = b/a+b 

[195] False negative rate is the proportion of positive cases that are incorrectly 

classified as negative = c/c+d 

[196] Geometric-mean is the performance measure that takes into account 

proportion of positive and negative cases (Kubat et al., ibid) = the square root of 
TP*TN where TP = true positive rate (d/c+d) and TN = true negative rate (a/a+b). In 
these analyses cases where no prediction was made because the p-value ratio 
exceeded the cutoff-value (generally 0.5) the non-call was considered to be incorrect. 

[197] Random Selected Gene Sets: Subsets of randomly selected genes were 

prepared from the predictive gene sets to test whether such subsets would have 
predictive value. Assignments of genes to these subsets are presented inTables 6- 
7. Genes were also randomly selected from the list of all genes excluding the 142 
twenty-four hour predictive genes (also known as non-predictive genes) by assigning 
a random number to each gene, sorting by the random number and selecting the 
appropriate number of sorted genes. Assignments of genes to these subsets are 
presented in Table 8. 

[198] Results: Prediction results for 24 hour expression data using genes 

identified as predictive are presented in Table 9. These data indicate a very high 



46 



WO 03/085083 



PCT/US03/10141 



accuracy in predicting toxicity. Mean accuracy exceeded 0.92 (92% accuracy) for 
the entire predictive gene list (Combo All) and all the Combo gene lists. Because 
these predictions were conducted with multiple training/test set combinations it is 
possible to obtain an indication of the variability in prediction rates and robustness of 
the prediction capabilities of these gene sets. For the Combo All and other Combo 
lists there was very good predictivity for all training/test sets of data with over 0.75 
(75%) accuracy as a minimum value for any one training and test set and most lists 
giving over 0.8 (80%) minimum accuracy. False positive and false negative 
prediction rates were generally low with means generally less than 0.15 (15%) for all 
Combo lists. The geometric mean was used as an indication of predictive 
performance that includes consideration of the proportion of positive and negative 
classifications. All gene sets gave geometric mean measures >0.8 (80%) and four 
gene sets (Combo All, Combo 5, Combo 3 and Combo 2 gene lists) had mean 
measures >0.9. 

[199] As described in Materials and Methods in those cases where no prediction 

was made because the p-value ratio exceeded the cutoff-value (generally 0.5) the 
non-call was considered to be incorrect! 

[200] Prediction results for 24 hour expression data using genes identified as 

predictive and the predicting unit of compound-dose are presented in Table 10. This 
prediction unit is probably the most relevant for toxicology prediction. The 
performance of the genes in predicting compound-dose toxicity is even better than 
predictions on an individual animal basis. These data indicate a very high accuracy 
in predicting toxicity. Mean accuracy exceeded 0.9 (90% accuracy) for the entire 
predictive gene list (Combo All) and all of the Combo gene lists. Accuracy and was 
comparable for all the Combo lists. Variability in accuracy was low for most of the 
gene lists with >0.8 (80%) minimum accuracy for any single training and test set 
observed for the Combo All and Combo 5, 4, 2 and 1 gene lists. Particularly 
noteworthy on the compound-dose level prediction is the low false-negative rate and 
false positive rates observed for all of the Combo sets. The geometric mean 
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measure of predictive performance also indicated excellent predictive properties for 
all gene sets. 

[201] One noteworthy feature of the predictive capability is the ability to distinguish 

between effects of a compound at different dose levels. Four compounds (ANIT, 
APAP, LPS and TET) produced toxicity at the high dose but not at the low dose. The 
predictive gene sets were usually accurate in predicting toxicity at the high dose and 
predicting no toxicity at the low dose. 

[202] Prediction results for 24 hour expression data using genes identified as 

predictive and the predicting unit is compound are presented in Table 11. 

[203] Predictive performance on a compound basis with accuracies and geometric 

mean measures being at or above 0.9 (90%) and very low false positive and false 
negative error rates. Table 12, 13, and 14 show the level of predictive accuracy of 
individual genes of Combos 5, 4, and 3, respectively, for 24 hour data. 

[204] The tables show that overall, individual genes of the Combo groups did not 

perform as well as the combination as a whole, as the average predictive accuracy of 
individual genes versus the entire combo set was 82.6% vs. 89.6% for Combo 5, 
80.8% vs. 85.7% for Combo 4, and 69.8% vs. 86.5% for Combo 3. The table also 
shows that while many of the individual genes of the Combo groups gave a good 
level of predictive accuracy (as high as 89.2% for individual genes of Combo 5, 
90.6% for Combo 4, and 82.1% for Combo 3), the predictive accuracy of individual 
genes rarely exceeded the predictive accuracy of the whole combination. 

[205] In order to assess the performance of subsets of genes, predictive 

performance was evaluated for subsets of genes randomly selected from the total 
combined predictive list (Combo All) and the top Combo sets (as defined in Materials 
and Methods). Prediction results for 24 hour expression data using randomly 
selected subsets of genes are presented in Table 15. These data clearly indicate 
that smaller subsets of the Combo gene lists have predictive power. 
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[206] Table 16 compares prediction accuracy for correct classification of toxicity 

and for the same proportion of positive and negative toxicity calls randomly assigned 
to the samples (random classification). For each gene set or subset predictions were 
made using the same five training/test sets as for the other prediction analyses. 
Additionally, sets of genes were randomly chosen from the array which were not 
identified on the list of 142 predictive genes at 24 hour (Example 1 , Table 5). 

[207] It is clear from these data that the predictions with accurate classification are 

much better than predictions with randomized classification. This means that the 
predictive results are not simply due to chance and large data sets but are due to 
significant, meaningful predictive association between the gene expression of the 
predictive genes and the toxicity. The accuracy numbers for the gene sets selected 
from a list of all genes on the array minus the predictive genes are much lower than 
the Combo predictive lists and the random subsets of these predictive lists. This also 
verifies the predictive power of the identified predictive genes. The fact that the 
predictive numbers from these subsets are somewhat higher for accurate than 
random classification is likely due to some residual predictivity in these genes that is 
not very substantial. 

[208] Example 3 

[209] Materials and Methods: Compounds and treatments list used to construct 

the database are given in Table 1 of Example 1. This table also provides the 
evaluation of the toxicity observed as hepatocellular necrosis in samples collected 72 
hours after treatment. A database is described in detail in Example 1. This Example 
analyzes expression data from samples collected 6 hours after treatment. 

[210] Array Data, Normalization and Transformation: Array data, normalization 

and transformation procedures used were as described in Example 1. 

[211] Correlation with Histopathology Scores: Procedures and methods for 

obtaining gene lists correlating with histopathology scores were as described in 
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Example 1 (Table 1). 

[212] Class Prediction: The Predict Parameter Values tool in GeneSpring™ 

software used for toxicity class prediction is described in detail in Material and 
Methods of Example 1 . 

[213] Training and Test Data Sets: Data were each separated into 5 training and 

test sets by randomly distributing the compounds into the sets. This was 
accomplished by assigning random numbers to lists of compounds that are negative 
and positive for histopathology, sorting by random number, and then dividing the 
sorted lists into a specific number of training and test sets. {The training and test set 
assignments are presented in the following Table 17. 

[214] Toxicity Classification: toxicity classifications were entered for training and 

test set as a parameter column. Toxicity, as defined by observation of hepatocellular 
necrosis in the at 72 hours after treatment, was entered as a "yes" or "no" for each 
animal in a compound-dose group. Additionally, a parameter column for random 
histopathology classification was designated. This was done by randomly assigning 
the of "yes" and "no" calls to the individual animals such that the total number of "yes" 
and "no" calls were the same as the correctly assigned classification. 

[215] Prediction Output and Initial Data Processing: The "Predict Parameter 

Value" tool of GeneSpring was used with each of the training and test sets to 
generate predictions of histopathology classifications of the test sets. Unless 
otherwise specified a nearest neighbor setting of 10 (default) and P-value ratio cutoff 
of 0.5 was used. The number of genes used to predict was varied with standard 
numbers of 50, 40, 30, 20, 10,5,2 and 1 genes used. For each number of genes the 
numbers of correct calls, incorrect calls and non-calls were recorded. Non-calls are 
cases where no prediction was made because the P-value ratio exceeded the 
specified P-value ratio cutoff. Calculations were made for overall percent correct 
calls (number of correct classifications/number or samples), percent correct calls of 
called samples (number of correct calls/number of samples with calls) and percent of 
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called samples (samples with calls/number of samples). 

[216] For each input list and optimal number of predictive genes (lowest number of 

genes giving a maximum overall percent of correct calls) additional information was 
recorded that included the list of specific genes in the optimum predictive set. 

[217] Results: Expression array data were examined for the existence of genes 

whose expression correlated with histopathology scores. Table 1 in Materials and 
Methods of Example 1 presents a list of the compounds and dose levels along with 
the histopathology classification and histopathology severity scores used for this 
analysis. For each distance measure the probability was adjusted in increments of 
0.05 until at least 50 correlating genes were obtained. Lists of correlating genes 
were obtained using the distance measures described in Materials and Methods. 
Example sets of correlating genes are provided in Tables 18-19. 

[218] • The correlating gene lists as well as the entire array gene list were provided 
as input lists to the GeneSpring Predict Parameter value tool (described in Materials 
and Methods) that employs a K-means nearest neighbor {knn) predictive model. 
These lists as well as the entire array gene list were used for each of the six training 
and test sets defined in Materials and Methods o generate predictions of 
histopathology classifications of the test sets. Input genes for the Predict Parameter 
Value feature included all 700 genes in the GenePix file (the Rat CT Array) as well as 
smaller lists of genes whose expressions correlated with histopathology by the 
correlation measures described previously. The number of genes used to predict are 
varied with standard numbers of 50, 40, 30, 20, 10, 5, 2 and 1 genes used. The 
specified number of predictive genes was varied to obtain an optimum number of 
predictive genes. 

[219] After this was done for 5 training and test sets, gene lists were then merged 

to create one aggregate list of predictive genes. Each gene on this aggregate list 
has predictive value for at least one of the training and test sets because it was 
observed to contribute to an optimum predictivity for a specific training/test set. The 
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aggregate list was subdivided into smaller lists of genes based on the number of 
times a gene was predictive for an individual training or test set. For example, if 5 
training and test sets were used, genes that were predictive in 5 training and test sets 
were designated as Combo (combination) 5. Genes that were predictive in only 4 of 
5 training and test sets were designated as Combo 4, etc. 

[220] A list of predictive genes organized by their occurrence in the separate 

training and test sets is presented in Table 20. 

[221] Example 4 

[222] Materials and Methods: The database used was as described in Example 1. 

[223] Array Data, Normalization and Transformation: Array data, normalization 

procedures and transformations used in these analyses are as described in Example 
1. Table 34 lists 6 hour gene expression data for the predictive genes. These data 
can be used with a k-means nearest neighbor prediction model (as available in 
GeneSpring or other statistical software packages) to make predictions as described 
in this example 

[224] Class Prediction: The Predict Parameter Values tool in GeneSpring™ 

software was used for toxicity class prediction. A description of this tool and the 
statistical procedures used is provided in Example 1. 

[225] Training and Test Data Sets: The training and test data sets used are those 

described in Table 17 of Example 3. 

[226] Toxicology Classification: toxicology classifications used are described in 

Table 1 of Example 1, In this analysis randomized classifications (same number of 
"yes" and "no" classifications distributed randomly among the samples) were used. 

[227] Prediction Output and Initial Data Processing: For each gene list prediction 

used for evaluation a table of data generated by the Predict Parameter Values tool in 
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GeneSpring™ software was saved which provided for each sample in the test set the 
actual call ("yes" or "no" for toxicity), the predicted call ("yes", "no" or no call for 
toxicity) and the P-value cutoff ratio. This set of data was used to calculate predictive 
performance measures provided below. 

[228] Prediction Measures: Measures of prediction used for these analyses are 

generally accepted prediction measures for information about actual and predicted 
classifications done by a classification system {Modern Applied Statistics with S-Plus, 
W. N. Venables and B. D. Ripley, Springer, 1994, 3 rd edition; Proc. 14 th International 
Conference on Machine Learning, Miroslav Kubat, Stan Matwin, 1997). Results from 
predictions of a two class case can be described as a two-class matrix: 



Actual 




Predicted 




Negative 


Positive 


Negative 
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Positive 


c 


d 



[229] Standard terms used for prediction are: 

[230] Accuracy is the proportion of total number of predictions that are correct = 

a+d/a+b+c+c 

[231] False positive rate is the proportion of negative cases that are incorrectly 

classified as positive = b/a+b 

[232] False negative rate is the proportion of positive cases that are incorrectly 

classified as negative = c/c+d 

[233] Geometric-mean is the performance measure that takes into account 

proportion of positive and negative cases (Kubat et al M ibid) = the square root of 
TP*TN where TP = true positive rate (d/c+d) and TN = true negative rate (a/a+b). In 
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these analyses cases where no prediction was made because the p-value ratio 
exceeded the cutoff-value (generally 0.5) the non-call was considered to be incorrect. 

[234] Results: Prediction results for 6 hour expression data using genes identified 

as predictive are presented in Table 21. These data indicate accuracy in predicting 
toxicity with 6 hr expression data. Mean accuracy exceeded 0.7 (70% accuracy) for 
the entire predictive gene list (Combo All) and 0.6 (60%) for the Combo gene lists. 
. Mean false positive and false negative values were in the range of 0.3-0.4 for the 
best predicting gene sets and the geometric mean measures were higher than 0.6 
except for the Combo 1 gene set. Comparison of predictive performance for correct 
and random classification is given in Table 22. 

1235] It is clear from these data that the predictions with accurate classification are 

much better than predictions with randomized classification. This means that the 
predictive results are not simply due to chance and large data sets but are due to 
significant, meaningful predictive association between the gene expression of the 
predictive genes and the toxicity. 

[236] Example 5 

[237] Database - Compounds and Toxicity: Compounds and treatments list used 

to construct the database are given in Table 1 of Example 1. This table also 
provides the evaluation of the toxicity observed as hepatocellular necrosis in samples 
collected 72 hours after treatment. The Phase-1 Database is described in detail in 
Example 1. This Example analyzes expression data from samples collected 72 
hours after treatment, 

[238] Array Data, Normalization and Transformation: Array data, normalization 

and transformation procedures used were as described in Example 1. 

[239] Correlation with Histopathology Scores: Procedures and methods for 

obtaining gene lists correlating with histopathology scores were as described in 
Example 1 with scores as in Example 1 , Table 1 . 
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[240] Class Prediction: The Predict Parameter Values tool in GeneSpring™ 

software used for toxicity class prediction is described in detail in Material and 
Methods of Example 1 . 

[241] Training and Test Data Sets: Data were each separated into 5 training and 

test sets by randomly distributing the compounds into the sets. This was 
accomplished by assigning random numbers to lists of compounds that are negative 
and positive for histopathology, sorting by random number, and then dividing the 
sorted lists into a specific number of training and test sets. The training and test set 
assignments are presented in the Table 23. 

[242] Toxicology Classification: toxicity classifications were entered for training 

and test set as a parameter column. Toxicity, as defined by observation of 
hepatocellular necrosis in the at 72 hours after treatment, was entered as a "yes" or 
"no" for each animal in a compound-dose group. Additionally, a parameter column 
for random histopathology classification was designated. This was done by randomly 
assigning the same number of "yes" and "no" calls to the individual animals. 

[243] Prediction Output and Initial Data Processing: The 'Predict Parameter Value' 

tool of GeneSpring was used with each of the training and test sets to generate 
predictions of histopathology classifications of the test sets. Unless otherwise 
specified a nearest neighbor setting of 10 (default) and P-value ratio cutoff of 0.5 was 
used. The number of genes used to predict was varied with standard numbers of 50, 
40, 30, 20, 10, 5, 2 and 1 genes used. For each number of genes the numbers of 
correct calls, incorrect calls and non-calls were recorded. Non-calls are cases where 
no prediction was made because the P-value ratio exceeded the specified P-value 
ratio cutoff. Calculations were made for overall percent correct calls (number of 
correct classifications/number or samples), percent correct calls of called samples 
(number of correct classifications/number of samples with calls) and percent of called 
samples (samples with calls/number of samples). 

[244] For each input list and optimal number of predictive genes (lowest number of 
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genes giving a maximum overall percent of correct calls) additional information was 
recorded that included the list of specific genes in the optimum predictive set. 

[245] Results: Expression array data were examined for the existence of genes 

whose expression correlated with histopathology scores. Table 1 in Materials and 
Methods of Example 1 presents a list of the compounds and dose levels along with 
the histopathology classification and histopathology severity scores used for this 
analysis. For each distance measure the probability was adjusted in increments of 
0.05 until at least 50 correlating genes were obtained. Lists of correlating genes 
were obtained using the distance measures described in Materials and Methods. 
Example sets of correlating genes are provided in Tables 24-25. 

[246] . The correlating gene lists as well as the entire array gene list were provided 
as input lists to the GeneSpring Predict Parameter value tool (described in Materials 
and Methods) that employs a K-means nearest neighbor (knn) predictive model. 
These lists as well as the entire array gene list were used for each of the five training 
and test sets defined in Materials and Methods o generate predictions of 
histopathology classifications of the test sets. Input genes for the Predict Parameter 
Value feature included all 700 genes in the GenePix file (the Rat CT Array) as well as 
smaller lists of genes whose expressions correlated with histopathology by the 
correlation measures described previously. The number of genes used to predict are 
varied with standard numbers of 50, 40, 30, 20, 10, 5, 2 and 1 genes used. The 
specified number of predictive genes was varied to obtain an optimum number of 
predictive genes. 

[247] After this was done for 5 training and test sets, all gene lists were then 

merged to create one aggregate list of predictive genes. Each gene on this 
aggregate list has predictive value for at least one of the training and test sets 
because it was observed to contribute to an optimum predictivity for a specific 
training/test set. The aggregate list was subdivided into smaller lists of genes based 
on the number of times a gene was predictive for an individual training or test set. 
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For example, if 5 training and test sets were used, genes that were predictive in 5 
training and test sets were designated as Combo (combination) 5. Genes that were - 
predictive in only 4 of 5 training and test sets were designated as Combo 4, etc. 

[248] A list of predictive genes organized by their occurrence in the separate 

training and test sets is presented in Table 26. 

[249] Example 6 

[250] Database: The database used was as described in Example 1 . 

[251] Array Data, Normalization and Transformation: Array data, normalization 

procedures and transformations used in these analyses are as described in Example 
1 . Table 36 presents 72 hour gene expression data for the predictive genes. These 
data can be used with a k-means nearest neighbor prediction model (as available in 
GeneSpring or other statistical software packages) to make predictions as described 
in this example. 

[252] Class Prediction: The Predict Parameter Values tool in GeneSpring™ 

software was used for toxicity class prediction. A description of this tool and the 
statistical procedures used is provided in Example 1. 

[253] Training and Test Data Sets: The training and test data sets, used are those 

described in the table of Example 5. 

[254] Toxicology Classification: toxicology classifications used are described in 

Table 1 of Example 1. In this analysis randomized classifications (same number of 
"yes" and "no" classifications distributed randomly among the samples) were also 
used. 

[255] Prediction Output and Initial Data Processing: For each gene list prediction 

used for evaluation a table of data generated by the Predict Parameter Values tool in 
GeneSpring™ software was saved which provided for each sample in the test set the 
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actual call ("yes" or "no" for toxicity), the predicted call ("yes", "no" or no call for 
toxicity) and the P-value cutoff ratio. This set of data was used to calculate predictive 
performance measures provided below. 

[256] Prediction Measures; Measures of prediction used for these analyses are 

generally accepted prediction measures for information about actual and predicted 
classifications done by a classification system (Venables and Ripley, ibid] Kubat and 
Matwin, ibid). Results from predictions of a two-class case can be described as a 
two-class matrix; 
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Negative 
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[257] Standard terms used for prediction are the same as in Example 2. In these 

analyses cases where no prediction was made because the p-value ratio exceeded 
the cutoff-value (generally 0.5) the non-call was considered to be incorrect. 

[258] Results: Prediction results for 72 hour expression data using genes 

identified as predictive are presented in Table 27. These data indicate accuracy in 
predicting toxicity with 72 hr expression data. Mean accuracy exceeded 0.7 (70% 
accuracy) for the entire predictive gene list (Combo All) and Combo 4, 3 and 2 sets 
and 0.55 (55%) for the Combo 1 and 5 gene lists. Mean false positive and false 
negative values were in the range of 0.2-0.4 for the best predicting gene sets and the 
geometric mean measures were higher than 0.6 for all gene sets. 

[259] Comparison of predictive performance for correct and random classification 

is given in Table 28. 

[260] It is clear from these data that the predictions with accurate classification are 

much better than predictions with randomized classification. This means that the 
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predictive results are not simply due to chance and large data sets but are due to 
significant, meaningful predictive association between the gene expression of the 
predictive genes and the toxicity. 

[261] Example 7 

[262] Predictive Modeling; The predictive task with the toxicology gene expression 

data is a two-class classification problem, where the two classes of possible 
responses are defined by either hepatocellular necrosis (yes) or absence of 
hepatocellular necrosis (no). This is an uneven class problem in that the class of yes 
responses is roughly 20 percent of the data or less in the database tested. A 
discrimination function can be used to classify a training set. This function can be 
cross-validated with a testing set, often repeatedly to quantify the mean and variation 
of the classification error. There are numerous common discrimination functions, and 
a comparative study of the performance of these functions is useful in determining 
the best classifier. Additional measures can then be used to compare the 
performance of the classifiers. Since the classes are of significantly uneven sizes, 
use a geometric mean measure (GMM) can be used to compare models, namely, the 
square root of the product of the true positives and the true negatives. 

[263] Common discrimination methods are Fisher's linear discriminant, quadratic 

, discriminant (mahalanobis distance), /c-nearest neighbors (knn), logistic discriminant 
(MacLachlan, 1992), classification trees (or more generally known as recursive 
partitioning) (Breiman et al., 1984; Clark and Pregibon, 1993; Quinlan and Kaufman, 
1988), and neural network classifiers [Ripley, 1996). Most are formula-based such 
as linear and quadratic discriminant, whereas others are rule-based, such as 
recursive partitioning, or algorithmically based, such as knn. knn is also database 
dependent in that a database containing training set is needed to perform nearest 
neighbor search and classification. 

[264] Classifier Models: A variety of common classification techniques are 

available. A simple hybrid classifier could be designed and tested, using the knn 
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results, to transform the knn model into a database independent model. This model 
is termed a centroid model. The centroid model uses the correctly identified test data 
results from knn and locates a centroid of the subset of k samples that are of the 
same class for each correctly identified test sample. The centroid is assigned the 
correct class, and with new test data, a sample is assigned the class of its nearest 
centroid. 

[265] In addition to the knn and centroid models described above, tree, centroid, 

logistic, and neural network models could also be employed. The neural network is a 
simple, feed-forward network, allowing skip layers, and with an entropy fitting 
criterion. 

[266] Example 8 

[267] Animal Treatment and Tissue Harvest: Male Sprague-Dawley rats in groups 

of 3 were treated by intraperitoneal injection with test compounds (thioacetamide, 
200 mg/kg and a-naphthylisothiocyanate (AN IT), 100 mg/kg) or only with the vehicle 
in which the compound was mixed. At specified timepoints (24h and 72h) the rats 
were euthanized and tissues collected, tissues were immediately placed into liquid 
nitrogen and frozen within 3 minutes of the death of the animal to ensure that mRNA 
did not degrade. The tissues were sent blinded to be tested. The organs/tissues 
were then packaged into well-labeled plastic freezer quality bags and stored at -80 
degrees until needed for isolation of the mRNA from a portion of the organ/tissue 
sample. 

[268] Gene Expression Measurement: Isolation of RNA, preparation of cDNA 

labeled probes and hybridizations procedures were as described in Example 1 
Materials and Methods. Probes were hybridized to the rat CT Chip which is the 
same array as used for the database. 

[269] Data Analysis 

[270] Array data from the samples was loaded into GeneSpring software using the 
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same procedures as used for the database. No toxicity parameters were entered for 
these samples. The Predict Parameter Value tool was used to make toxicity 
predictions using different Combo Gene sets from the 24 hour data and the entire 
database as the training set. Other values used were 10 nearest neighbors and a p- 
value ratio cutoff of 0.5. 

[271] Results: Table 29 presents predictions for samples that were external to the 

database used to derive the predictive genes. The samples were samples from 
replicate animals treated with thioacetamide or ANIT. One of these compounds 
(AN IT) is also represented in the database (at a different dose level) and the other 
compound, thioacetamide, is not in the database. Histopathology conducted on the 
samples verified that these treatments induced hepatocellular necrosis. Each of the 
Combo gene sets correctly predicted that these samples had expression patterns 
indicative of toxicity. 

[272] These results demonstrate clearly that the discovered sets of predictive 

genes in conjunction with the database and K-means nearest neighbor model can 
accurately predict toxicity from microarray data that is external to the database. 
Because the database consists mostly of non-toxic samples the prediction of toxicity 
for these samples is significantly different from what would be expected from chance. 
It is also noteworthy that five different sets of predictive genes are capable of making 
accurate predictions. 



[273] This result provides a clear example of the predictive utility of this invention. 

[274] Example 9 

[275] Gene Expression Data: Gene expression data used for cluster analysis were 

the 24 hour expression data of the 68 genes of the combined Combo 5, 4, 3 and 2 
predictive gene sets. These data are contained in Table 35. 

[276] Cluster Analysis: Cluster analysis tools used in these analyses included K- 



means and gene tree features of GeneSpring software. 
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[277] Results: Figure 5 presents combined results of K-means and gene-tree 

hierarchical clustering analysis. Combo 5, 4, 3 and 2 (68 genes) were clustered 
using K-means (number of clusters 8, maximum iteration 100, similarity measure 
Pearson) and Gene tree (separation ratio 0.5, minimum distance 0.001, similarity 
measure Pearson). The k-means clusters are colored according to the corresponding 
set 1 to set 8). The gene on the display from left to right correspond to the gene 
names top to bottom in the Table 30. These data indicate that the predictive genes 
can be organized into sets of genes which have similar expression patterns. 

[278] It is understood that the examples and embodiments described herein are for 

illustrative purposes only and that various modifications or changes in light thereof 
will be suggested to persons skilled in the art and are to be included within the spirit 
and purview of this application and scope of the appended claims. All publications, 
patents and patent applications cited herein are hereby incorporated by reference in 
their entirety for all purposes to the same extent as if each individual publication, 
patent or patent application were specifically and individually indicated to be so 
incorporated by reference. 
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What is claimed is: 

1 . A method of predicting the liver toxicity in an individual to an agent 
comprising the steps of: 

obtaining a biological sample from the individual treated with the agent; 
measuring the expression of one or more liver toxicity predictive genes in 
the sample, wherein the genes are selected from the group consisting of 
partial gene sequences of genes identified as responsive to agents causing 
liver necrosis, thereby generating a test expression profile; and 
using the test expression profile with a set of reference expression profiles 
in a Predictive Model to determine whether the agent will induce liver 
toxicity in the individual. 

2. The method according to claim 1 , wherein the liver toxicity predictive genes 
are selected from the group of partial gene sequences listed in Table 32 
that represent 24 hour combo All genes. 

3. The method according to claim 2, wherein the partial gene sequences 
correspond to rat genes. 

4. The method according to claim 2, wherein the partial gene sequences 
correspond to dog genes. 

5. The method according to claim 2, wherein the partial gene sequences 
correspond to non-human primate genes. 

6. The method according to claim 2, wherein the partial gene sequences 
correspond to human genes. 

7. The method according to claim 1 , wherein the liver toxicity predictive genes 
are selected from the group of partial gene sequences listed in Table 32 
that represent 24 hour combo 2 genes. 

8. The method according to claim' 7, wherein the partial gene sequences 
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correspond to rat genes, 

9. The method according to claim 7, wherein the partial gene sequences 
correspond to dog genes. 

10. The method according to claim 7, wherein the partial gene sequences 
correspond to non-human primate genes. 

11. The method according to claim 7, wherein the partial gene sequences 
correspond to human genes. 

12. The method according to claim 1 , wherein the liver toxicity predictive genes 
are selected from the group of partial gene sequences listed in Table 32 
that represent 24 hour Combo 5 genes. 

13. The method according to claim 12, wherein the partial gene sequences 
correspond to rat genes. 

14. The method according to claim 12, wherein the partial gene sequences 
correspond to dog genes. 

15. The method according to claim 12, wherein the partial gene sequences 
correspond to non-human primate genes. 

16. The method according to claim 12, wherein the partial gene sequences 
correspond to human genes. 

17. A method of predicting the liver toxicity of an agent using an in vitro system, 
comprising the steps of: 

obtaining a biological sample from an in-vitro cultured cells or explants 
treated with the agent; 

measuring the expression of one or more liver toxicity predictive genes in 
the sample, wherein the genes are selected from the group consisting of 
partial gene sequences of genes identified as responsive to agents causing 
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liver necrosis, thereby generating a test expression profile; and 
using the test expression profile with a set of reference expression profiles 
in a Predictive Model to determine whether the agent will induce liver 
toxicity in the individual. 

18. The method according to claim 17, wherein the liver toxicity predictive 
genes are selected from the group of partial gene sequences listed in Table 
32 that represent 24 hour combo All genes. 

19. The method according to claim 18, wherein the partial gene sequences 
correspond to rat genes. 

20. The method according to claim 18, wherein the partial gene sequences 
correspond to dog genes. 

21. The method according to claim 18, wherein the partial gene sequences 
correspond to non-human primate genes. 

22. The method according to claim 18, wherein the partial gene sequences 
correspond to human genes. 

23. The method according to claim 17, wherein the liver toxicity predictive 
genes are selected from the group comprising of 24 hour Combo 2 genes. 

24. The method according to claim 23, wherein the partial gene sequences 
correspond to rat genes. 

25. The method according to claim 23, wherein the partial gene sequences 
correspond to dog genes. 

26. The method according to claim 23, wherein the partial gene sequences 
correspond to non-human primate genes. 

27. The method according to claim 23, wherein the partial gene sequences 
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correspond to human genes, 

28. The method according to claim 17, wherein the liver toxicity predictive 
genes are selected from the group of partial gene sequences listed in Table 
32 that represent 24 hour Combo 5 genes. 

29. The method according to claim 28, wherein the partial gene sequences 
correspond to rat genes. 

30. The method according to claim 28, wherein the partial gene sequences 
correspond to dog genes. 

31 The method according to claim 28, wherein the partial gene sequences 
correspond to non-human primate genes. 

32. The method according to claim 28, wherein the partial gene sequences 
correspond to human genes. 

33. A process for predicting the liver toxicity in a biological sample from an 
individual, an in-vitro cell cultures or explants to an agent via a 
programmable machine, the process comprising the steps of: 

obtaining a biological sample treated with the agent; 

measuring the expression of one or more liver toxicity predictive genes in 

the sample, wherein the genes are selected from the group consisting of 

partial gene sequences of genes identified as responsive to agents causing 

liver necrosis, thereby generating a test expression profile; and 

using the test expression profile with a set of reference expression profiles 

in a Predictive Model to determine whether the agent will induce liver 

toxicity in the individual. 

34. A computer program product for enabling a computer to perform Predictive 
Model analysis for liver toxicity on a biological sample from an individual, an 
in-vitro cell cultures or explants to an agent, the computer program product 
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comprising: 

software instructions for enabling the computer to perform predetermined 
operations, and a computer readable medium embodying the software 
instructions; 

The pre-determined operations comprising: 

measuring an expression of one or more liver toxicity predictive genes in a 
sample, wherein the genes are selected from the group consisting of partial 
gene sequences of genes identified as responsive to agents causing liver 
necrosis, thereby generating a test expression profile; and 
using the test expression profile with a set of reference expression profiles 
in a Predictive Model to determine whether the agent will induce liver 
toxicity in the individual. 

35. A Computer system adopted to predict liver toxicity in a biological sample 
from an individual, an in-vitro cell cultures, or explants to an agent, 
comprising a processor and a memory including software instructions 
adapted to enable the computer system to perform operations comprising: 
measuring the expression of one or more liver toxicity predictive genes in 
the sample, wherein the genes are selected from the group consisting of 
partial gene sequences of genes identified as responsive to agents causing 
liver necrosis, thereby generating a test expression profile; and 

using the test expression profile with a set of reference expression profiles 
in a Predictive Model to determine whether the agent will induce liver 
toxicity in the individual. 

36, A computer program product for predicting liver toxicity from a test sample 
expression profile, comprising: 

an encrypted training data set; 

encrypted lists of genes selected from genes predictive of liver toxicity to be 
used with the encrypted training data set, and 
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a Predictive Model that uses the encrypted training data sets, the encrypted 
lists of genes, and the test sample expression profile to predict the liver 
toxicity of the test sample. 

37. The computer program product of claim 36, wherein the encrypted lists of 
genes are selected from any Combination Category appearing in Tables 5, 
20 and 26. 

38. The computer program product of claim 36, wherein the encrypted lists of 
genes comprise a 24 hour Combo All genes as set in Table 5. 

39. The computer program product of claim 36, wherein the encrypted lists of 
genes comprise a 6 hour Combo All genes as set in Table 20. 

40. The computer program product of claim 36, wherein the encrypted lists of 
genes comprise a 72 hour Combo All genes as set in Table 26. 

41 . A method for mining genes predictive for liver toxicity, comprising the steps 
of: 

collecting expression levels of a plurality of candidate toxicity predictive 

genes among a multiplicity of samples; 

defining a group of samples to be a training set; 

defining another group of samples to be a test set; 

optionally generating additional training and test sets; and 

selecting a set of genes which are predictive of liver toxicity based on 

evaluating the training and test sets in a Predictive Model. 

42. The method according to claim 41 , wherein the expression levels are stored 
as a database on an electronic medium. 

43. An integrated system for predicting liver toxicity, comprising: 

means for measuring gene expression profiles of genes predictive of liver 
toxicity from biological samples exposed to a test agent; and 
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a computer system operably linked to the means wherein the computer 
system is capable of implementing a Predictive Model. 
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Discovery of Predictive Genes for Liver Toxicity 



Liver Database 
Liver samples — rats treated with 45 cpds 
Rat CT Expression array data for samples 
Pathology data (72h samples) 




Pathology scores (semiquant.) 



correlation analyses 



Lists of genes correlating with 
histopathology scores 



Classification of liver toxicity 
yes" or w no M for each sample 



Assignment of cpds/sample 
array data into 5 different 
training/test sets 

Training Set 1 Set 2 Set 5 

Test Set 1 Set 2 Set 5 



Predictive Model 
(predict liver toxicity classification) 



Vary number of genes used in prediction 
Obtain optimum gene list (lowest number of 
genes with highest accuracy) for each input 
gene list and training/test set 



Merge optimum predictive gene lists for each training/test set 
Train/Test 1 List Train/Test 2 List.... Train/Test 5 List 



Merge All Train/Test Lists Into Combined List of Predictive Genes 
(Combo All) 

Sort Genes into Combinations by Number of Occurrences on 
Individual Training/Test Lists 

(Combo 5,.. .Combo 1) 



Figure 1 
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Evaluation of Predictive Genes for Liver Toxicity 



Evaluated Gene Lists 
Combo All and Combo Sets 
Individ, genes in best Combo sets 
Randomly selected subsets 
Cumulative genes in Combo sets 
Subsets of "non-predictive" genes 



5 different training/test sets 
(same as for identification) 

Training Set 1 Set 2 Set 5 

Test Set 1 Set 2 Set 5 

Accurate and random 
classifications 



Predictive Model (KNN) 



Predictive Performance 

(means and ranges for 5 different training/test sets) 

Prediction Units— Sample, Cpd-Dose, Cpd 

Accuracy — proportion of correct classifications 

False positive — proportion of incorrect classifications for negative 
samples 

False negative — proportion of incorrect classifications for positive 
samples 

Geometric Mean— measure of predictive performance that considers 
proportion of pos. and neg. samples 

Comparison of accuracy for accurate and random classification 



Figure 2 
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Applications of Liver Toxicity 
Predictive Model 

KNN Model Input 



Prediction of liver toxicity ("yes", "no" or "no call") for 
each test sample 

Toxicity at early time point without pathology 

Prediction based on quantitative data/model (not 
subjective like pathology) 

Prediction of no-effect dose level 

Prediction of chronic toxicity? 

Prediction of in vivo toxicity from in vitro samples 

(possible human in vitro prediction) 



Expression data base and 
toxicity classifications 
(training data) 




Predictive Gene List(s) 
(e.g. best Combo) 



Expression data for test 
samples (test set) 

in vivo liver samples 

in vitro samples 



KNN Predictive Model 



Figure 3 
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Overall Percent Accuracy vs. Number of Predictor Genes 
Training/Test Set 3 Pearson Correlating Genes 



100% - 
95% - 
90%- 
85% - 
80% - 
75% ■ 
70%- 
65% 
60% 
55% 
50% 



20 3D AO 

Number of Predictor Genes 



Figure 4 
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Figure 5 

K-Means and Tree Cluster analysis of Combo 5, 4, 3 and 2 Genes 

Individual genes associated with each of the 8 clusters are presented in Table 30. 
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Table 1 Compounds, Dose Levels, Liver Pathology and Abbreviations in the Database 



Compound 


Dose Level 


Abbreviation 


J^lVCi 


Score** 


i -napxiuiyiiso iniocyanaie 


1 Sin oVor 
1 Jlllgtt.g 


A XJTT 1 S 




1 


i -napnuiynsotmocyauatc 


ouuigKg 


ANTT 60 

■TVLNll \J\J 


Yes 


2 


0 -iiuorouracn 


1 "3 m <y/]rcr 

id mgfitg 




no 


1 


j-iiuorourocu 


ju mgfK.g 


S-PTT SO 


JL1VJ 


1 


ac eiaininopnen 


OSO rna/lf a 


APAP 9S0 


ViSJ 


1 


ac etaniinopnen 


1 000 ma/lra 

luuu illg/lVg 


APAP 1000 


Yes 


2 


allaXOXin r> 1 


I mg/jvg 


APT R 1 


Yes f no 24W 


8 


amphotericin B 


C *vi ctf\ro 

j mg/icg_ 


AMPR S 


No 


1 

JL 


amphotericin B 


On m nflrcr 

zu mg/Kg 


AMPR 90 
/VLVJLl -D Zv 


No 


1 
1 


azathioprine 


du mg/jcg 


A7A SO 


No 

1NU 


1 
I 


azathioprine 


zuu mg/Kg 


A7A 700 


No 


1 
1 


benzene 


mi/Kg 


T3PN 0^0 


No 


1 

1 


benzene 


i mi/ Kg 


RPN 1 000 


No 


X 


benzo[a]pyrene 


mg/Kg 


RAP 10 


No 


I 
A 


bromobenzene 


u.z mi/Kg 


900 


Ypq 

I Co 




bromobenzene 


u.o mi/ Kg 


"DOT} COO 


I Co 


4 


busulfan 


14 mg/Kg 


dUj It 


tin 


1 


cadmium chloride 


i mg/Kg 


PAP> 1 


no 


1 


cadmium chloride 


z mg/Kg 


PAD 9 


No r79h nnlv> 

VS\J \ / All KJlllj ) 


1 


cadmium chloride 


4 mg/Kg 


PAD 4 


Vpe ^ onl 


3 


carbon tetrachloride 


vj.zj mi/Kg 


PPT A 9 SO 


Yeq 

1 Co 


3 


carbon tetrachloride 


i mi/Kg 


PPT 4 1000 


1 CO 


6 


carmustine 


i o mg/Kg 


PAP 1 6 


no 




chloroform 


AOS ml/Vo 

u.zo mi/Kg 


PTTPT 3 9 SO 


no 

ViSJ 




chloroform 


f! S ml /Ira 

u.j mi/Kg 


PPTPT 3 S00 


no 




chlo rprom azine 


Si m or A/"Or 

o mg/Kg 


PHT OP 8 


no 


1 


cniorpromaznie 


nig/ Kg 


PHT OR 30 


no 


l 


cispiaun 


7 S ma/iVa 

^.J lilg/Kg 


CIS 2.5 


no 


- j 


cisplatin 


10mg/kg 


CIS 10 


no 


1 


clofibrate 


75 mg/kg 


CL0 75 


no 




clofibrate 


250 mg/kg 


CLO 250 


no 




clozapine 


45 mg/kg 


CLOZ 45 


no 




clozapine 


180 mg/kg 


CLOZ 180 


no 




carboxy methyl cellulose 


30 mg/kg 


CMC 30 


no 




cycloheximide 


0.5 mg/kg 


CHEX 0.5 


no 




cycloheximide 


2 mg/kg 


CHEX2 


no 




cyclophosphamide 


25 mg/kg 


CPHOS 25 


no 




cyclophosphamide 


100 mg/kg 


CPHOS 100 


no 





Page 1 of 1 



WO 03/085083 



PCT/US03/10141 



cyclosporin A 


20 mg/kg 


CYCA20 


no 


1 


cyclosporin A 


80 mg/kg 


CYCA 80 


no 


1 


dexamethasone 


8 mg/kg 


DEX8 


no 


1 


dexamethasone 


30 mg/kg 


DEX30 


no 


1 


diflunisal 


25 mg/kg 


DIF 25 


no 




diflunisal 


100 mg/kg 


DIF100 


no 


1 


dimetliylnitrosamine 


20 mg/kg 


DMN20 


Yes 




doxorubicin 


12 mg/kg 


DOX 12 


no 




erythromycin estolate 


40 mg/kg 


ERY40 


no 


1 


erythromycin estolate 


160 mg/kg 


ERY 160 


no 


1 


estradiol 


0.1 mg/kg 


EST 0.1 


no 


1 


estradiol 


0.4 mg/kg 


EST 0.4 


no 


1 


ethanol 


2.5 ml/kg 


ETH2500 


no 


1 


gancyclovir 


50 mg/kg 


GAN50 


no 


1 


gancyclovir 


200 mg/kg 


GAN200 


no 


1 


gentamicin 


38 mg/kg 


GEN 38 


no 


1 


gentamicin 


150 mg/kg 


GEN 150 


no 


1 


lydroxyurea 


250 mg/kg 


HYD250 


no 


1 


hydroxyurea 


1000 mg/kg 


HYD1000 


no 


1 


isoniazid 


50 mg/kg 


ISON50 


no 


1 


isoniazid 


200 mg/kg 


ISON200 


no 


1 


ketoconazole 


20 mg/kg 


KETO 20 


no 


1 


ketoconazole 


80 mg/kg 


KETO 80 


no 


1 


lipopolysaccharide 


2 mg/kg 


LPS 2 


no 


1 


lipopolysaccharide 


8 mg/kg 


LPS 8 


Yes 




methotrexate 


1.3 mg/kg 


MET 1.3 


no 


1 


methotrexate 


5 mg/kg 


MET 5 


no 


1 


naloxone 


45 ml/kg 


NAL45 


no 


1 


naloxone 


180 mg/kg 


NAL180 


no 


1 


phenobarbital 


20 mg/kg 


PBARB 20 


no 


1 


phenobarbital 


80 mg/kg 


PBARB 80 


no 




phenylhydrazine 


20 mg/kg 


PHEN20 


no 


1 


phenylhydrazine 


80 mg/kg 


PHEN80 


no 




polyethylene glycol 


5 ml/kg 


PEG 5000 


no 


1 


puromycin 


38 mg/kg 


PUR 38 


no 


1 


puromycin 


150 mg/kg 


PUR 150 


no 


1 


quinidine 


25 mg/kg 


QUIN25 


no 


1 


quinidine 


100 mg/kg 


QUIN100 


no 


- 1 


streptozotocin 


20 mg/kg 


STRZ 2U 


no 




streptozotocin 


75 mg/kg 


STRZ75 


no 




tamoxifen 


50 mg/kg 


TAM 50 


no 




tamoxifen 


200 mg/kg 


TAM200 


no 




tetracycline 


50 mg/kg 


TET50 


no 




tetracycline 


150 mg/kg 


TET 150 


Yes 


2 
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theophylline 


25 mg/kg 


THE025 


no 


1 


theophylline 


100 mg/kg 


THEO 100 


no 


1 



* Values in parentheses indicate that array data are only available for indicated time points 

** Histopathology liver necrosis severity scores. 1= not remarkable; 2 and higher indicate 
histopathology of increasing severity 
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Table 2 Distribution of Compounds* in Individual 
Training and Test Sets for 24 Hour Liver Data 



Training and Test Set 1 



Training Set 1 


Training Set 1 


Test bet l 


Test Set 1 


Negative 


Positive 


.Negative 


Positive 


C T7TT 


AJN11 




PPT A 


AJVLro 


ArAr 


r 1 at? 


T "DC 


AZA 


r>T>T3 

r>Kt> 


pTTT /~\P 




T) AD 

bAr 


TYlVifM 

UMlN 






BJSfN 








Tjt to 

BUS 




TYCV 




CHCJL3 




TjCT 

Ho 1 




LiibX 




(jrbJN 




CLOZ 




JHLYJJ 




CMC 




TO/TNT 




CPHOS 




A /TCT 

Mb I 




CYCA 


t 


NAL 




DIF 




PHEN 




DOX 




PUR 




ERY 




QUIN 




ETH 




STRZ 




GAN 








KETO 








PBARB 








PEG 








TAM 








THEO 








Trainine and Test Set 2 


Training Set 2 


Training Set 2 


Test Set 2 


Test Set 2 


Negative 


Positive 


Negative 


Positive 


5-FU 


BRB 


AZA 


ANIT 


AMPB 


CCL4 


BAP 


APAP 


BEN 


LPS 


BUS 


DMN 


CAD 


TET 


CAR 




CHEX 




CHCL3 




CHLOR 




CLO 




CIS 




CPHOS 




CLOZ 




DDF 




CMC 




DOX 
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CYCA 




ERY 




DEX 




GAN 




EST 




ISON 




ETH 




MET 




GEN 




TIT TT7XT 

PHEN 




HYD 




PUR 




KETO 




STRZ 




XT A T 

NAL 








PBARB 








PEG 








QUM 








TAM 








THEO 









Training and Test Set 3 



Training Set 3 
NNegative 


1 1 raining bet d 
PPositive 


1 est Del J 


Test Set 3 
Positive 


5-FU 


ANIT 


BAP 


APAP 


AMPB 


BRB 


CAD 


CCL4 


AZA 


DMN 


CHEX 


TET 


BEN 


LPS 


CIS 




BUS 




CLO 




CAR 




CMC 




CHCL3 




CYCA 




CHLOR 




DIF 




CLOZ 




ISON 




CPHOS 




NAL 




DEX 




PEG 




DOX 




PUR 




ERY 




QUEST 




EST 




STRZ 




ETH 




TAM 




GAN 




THEO 




GEN 








HYD 








KETO 








MET 








PBARB 








PHEN 
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Training and Test Set 4 



Training Set 4 
Negative 


Training Set 4 

JrOSiuve 


Tact Cpt A 

l est oex *+ 

"W^ositi vp 
IN vgdli v c 


l est oei *f 


AZA 


ADATj 

Ar Ax 




A NTT 


T) AD 

BAr 


PPT A 


AA/TPP 




CAJJ 


TYXAXT 

JJJVUN 


"RFTsJ 

JD-EJ.N 


TFT 


OAK 


T "DQ 


.DUO 




CribX 








CHJLUK 








CIS 




fYPA 




CLOZ 




TlTT? 

•U'lr 




CMC 








CPHOS 




FRY 




ETH 




VQT 

J-jO 1 




/~i A XT 

GAN 








GEN 









TT\rp\ 

HYD 




pT ]~D 
Jr VJXv 




TSON 




QUIN 




MET 




TAM 




NAL 








PBARB 








PHEN 








STRZ 








THEO 








DEX 









Training and Test Set 5 



Training Set 5 
Negative 


Training Set 5 
Positive 


Test Set 5 
Negative 


Test Set 5 
Positive 


5-FU 


ANIT 


BAP 


CCL4 


AMPB 


APAP 


BEN 


LPS 


AZA 


BRB 


BUS 


TET 


CAD 


DMN 


CIS 




CAR 




DEX 




CHCL3 




DIF 




CHEX 




ERY 




CHLOR 




EST 




CLO 




GEN 




CLOZ 




HYD 




CMC 




PBARB 




CPHOS 




PEG 
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DT TD 

rUK 
















TP A H vf 
1AM 








THEO 




ISON 








KETO 








MET 








NAL 








PHEN 








QUIN 









* For abbreviations please see Table 1 (Compound, Dose, Abbreviation, etc.) 
** Negative= Compounds that did not elicit histopathology (score=l) 

Positive= Compounds that did elicit histopathology (score of 2 or greater) 
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Table 3. List of Genes, Whose Expression at 24h Directly Correlates with Liver 
Necrosis at 72h, Ranked by Pearson Correlation Coefficient 



Gene 


Correlation 
Coefficient 


Gaddl53 


0.649 


Phase- 1 RCT-179 


0.641 


Superoxide dismutase Mn 


0.633 


Gadd45 


0.613 


Phase-1 RCT-144 


0.613 


Calpactin I heavy chain 


0.611 


Phase-1 RCT-207 


0.603 


14-3-3 zeta 


0.593 


Gamma-actin, cytoplasmic 


0.590 


CyclinG 


0.574 


Cathepsin L, sequence 2 


0.572 


Macrophage inflammatory protein-2 alpha 


0.566 


Phase-1 RCT-68 


0.560 


Zinc finger protein 


0.553 


Multidrug resistant protein-2 


0.546 


Phase-1 RCT-225 


0.545 


Melanoma-associated antigen ME491 


0.544 


60S ribosomal protein L6 


0.540 


Integrin betal 


0.539 


Organic cation transporter 3 


0.537 


Phase-1 RCT-49 


0.534 


Heme oxygenase 


0.533 


Phase-1 RCT-205 


0.531 


Phase-1 RCT-242 


0.530 


Uncoupling protein 2 


0.528 


IgE binding protein 


0.524 


Phase-1 RCT-50 


0.515 


Phase-1 RCT-213 


0.515 


Nucleoside diphosphate kinase beta isoform 


0.512 


IkB-a 


0.511 


Phase-1 RCT-39 


0.509 


Endogenous retroviral sequence, 5' and 3' LTR 


0.508 


Phase-1 RCT-192 


0.507 


Phase-1 RCT-109 


0.504 


Phase-1 RCT-145 


0.504 


Phase-1 RCT-152 


0.503 


Phase-1 RCT-154 


0.502 
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Voltage-dependent anion channel (Vdac2) 


0.502 


Ubiquitin conjugating enzyme (RAD 6 
homologue) 


0.499 


PAR interacting protein 


0.498 


Insulin-like growth factor binding protein 1 


0.495 


Cofilin 


0.493 


Ribosomal protein LI 3 A 


0.493 


Pyruvate kinase, muscle 


0.493 


Beta-actin 


0.492 


60S ribosomal protein L6 (alternate clone 1) 


0.492 


Phase-1 RCT-37 


0.482 


Phase-1 RCT-72 


0.481 


ID-1 


0.478 


Thymosin beta- 10 


0.472 


Osteoactivin 


0.470 


Multidrug resistant protein- 1 


0.466 


Phase-1 RCT-127 


0.463 


p53 


0.459 


Phase-1 RCT-241 


0.459 


Elongation factor- 1 alpha 


0.457 


Matrix metalloproteinase-1 


0.457 


c-myc 


0.456 


Phase-1 RCT-162 


0.455 


Beta-tubulin, class I 


0.454 


Interleukin-1 beta 


0.451 
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Table 4 List of Genes, Whose Expression at 24h Inversely Correlates with Liver 
Necrosis at 72h, Ranked by Spearman Correlation Coefficient 



Gene 


correlation 

I AATTIOIOTl't 

v^ucxTicient 


Pha<se-1 PPT-91 

X iiclot<- 1 XVv X jL 1 


-U.jOU 


Pha<ip-1 10 


-U.J5U 




-U.Jji 


Phase- 1 PfT-164 


-U.jjo 


/Phase- 1 RCT QJtt 


-U.jjj 


HlVTCr-doA SvnthflQf* rnitrirhrmririal 


-U.jj / 


(Phase- 1 RCT-2001 


-U.j jo 


Phase- 1 RCT-161 




AlfiehvHe HehvHmapnflCf* 9 
-rxxvxt'xxj'u.w u.^xxjux vgCllaoC 




Phase- 1 RfT-1 17 

X xxcto V X XV X X 1 / 


n ^o 
-u. j jy 


Phase- 1 RCT-270 


-U.J J7 


Octainer hinrliTK? nrntpin 1 


-UO01 


Diazenam hindim* inhiHitnr 

J^ld^V^CUil l/XXXVXXXXg 1 1 1 1 1 1 L/l Ivi 


-uooz 


Phase-1 RCT-189 




Phase-1 RCT-175 




Cvtochrome P450 11A1 




Phase-1 RCT-123 


"U.JUJ 


Phase-1 RCT-239 


-0 

"U.JU*? 


Phase-1 RCT-64 


-0 ^67 


Phase-1 RCT-8 


-0 371 

V.J / X 


Phase-1 RCT-131 


-0.374 


Preproalbumin, sequence 2 


-0 376 

V.J / V 


Fatty acid synthase 


-0 379 

V.J / 7 


NADP-denendent isocitrate dehydrogenase cvtosolie 


u.JOu 


Phase-1 RCT-290 


UJOU 


Extracellular-sienal-reeulated kinase 1 


-0 380 


ATPase inhibitor (rat mitochondrial EF1 protein) 


-0 381 

UJ01 


Phase-1 RCT-40 


-0 3R1 


Stem cell factor 


-0.384 


Phase-1 RCT-227 


-0.384 


Apolipoprotein All 


-0.387 


NADH-cytochrome b5 reductase 


-0.388 


Histidine-rich glycoprotein 


-0.390 


Phase-1 RCT-280 


-0.390 


Methylacyl-CoA racemase alpha 


-0.392 


Contrapsin-like protease inhibitor (CPi-21) 


-0.394 


Phase-1 RCT-209 


-0.394 


Glutathione peroxidase 


-0.398 
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Betaine homocysteine methyltransferase (BHMT) 


-0.400 


Aquaporin-3 (AQP3) 


-0.403 


Phase-1 RCT-233 


-0.405 


Sterol carrier protein 2 


-0.407 


Tryptophan hydroxylase 


-0.408 


Cytochrome P450 3A1 


-0.409 


Phase-1 RCT-83 


-0.411 


Senescence marker protein-30 


-0.416 


Phase-1 RCT-289 


-0.416 


Carbonic anhydrase HI, sequence 2 


-0.417 


Phase-1 RCT-185 


-0.418 


Transthyretin 


-0.419 


Phase-1 RCT-181 


-0.420 


Sodium/bile acid cotransporter 


-0.423 


Paraoxonase 1 


-0.426 


Phase-1 RCT-128 


-0.426 


Phase-1 RCT-182 


-0.430 


Phase-1 RCT-296 


-0.430 


Phase-1 RCT-291 


-0.431 


Phase-1 RCT-264 


-0.432 


Phase-1 RCT-52 


-0.437 


Aldehyde dehydrogenase, microsomal 


-0.442 


Organic anion transporter 3 


-0.442 


Presenilin-1 


-0.447 


Phase-1 RCT-102 


-0.449 


Phase-1 RCT-89 


-0.449 


Phase-1 RCT-218 


-0.450 


N-hydroxy-2-acetylaminofluorene sulfotransferase (ST1C1) 


-0.452 


Liver fatty acid binding protein 


-0.456 


Apolipoprotein CHI 


-0.456 


Phase-1 RCT-88 


-0.457 


Phase-1 RCT-168 


-0.457 


Alpha 1 - inhibitor in 


-0.461 


Phase-1 RCT-288 


-0.464 


Equilbrative nitrobenzylthioinosine-sensitive nucleoside transporter 


-0.465 


Phase-1 RCT-33 


-0.465 


Phase-1 RCT-256 


-0.466 


Phase-1 RCT-36 


-0.468 


Dynamin-1 (D100) 


-0,470 


L-gulono-gamma-lactone oxidase 


-0.472 


Phase-1 RCT-38 


-0.477 


Phase-1 RCT-214 


-0.478 


Carbonic anhydrase HI 


-0.485 


MatrinF/G 


-0.489 
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Phase- 1 RCT-92 


-0.492 


Hepatic lipase 


-0.498 


Phase- 1 RCT-78 


-0.507 



Page 12 of 12 



WO 03/085083 PCT7US03/10141 



Table 5 Predictive Genes for 24 Hour Expression Data 



Gene Name 


Combination Category 


Gamma-actin, cytoplasmic 


5 


MatrinF/G 


5 


Phase-1 RCT-78 


5 


Cathepsin L, sequence 2 


4 


Gadd45 


4 


Phase-1 RCT-144 


4 


Phase-1 RCT-145 


4 


Phase-1 RCT-50 


4 


Phase-1 RCT-92 


4 


Zinc finger protein 


4 


14-3-3 zeta 


3 


Dynamin-1 (D100) 


3 


Insulin-like growth factor bmdmg protem l 


3 


L-gulono-gamma-lactone oxidase 


3 


Ornithine decarboxylase 


3 


PAR interacting protein 


3 


Phase- 1 RCT-128 


3 


Phase- 1 RCT-180 


3 


Phase- 1 RCT-182 


3 


Phase-l RCT-207 


3 


Phase-l RCT-213 


3 


Phase-l RCT-256 


3 


Phase-l RCT-258 


3 


Phase-l RCT-264 


3 


Phase-l RCT-271 


3 


Phase-l RCT-288 


3 


Phase-l RCT-33 


3 


Phase-l RCT-36 


3 


Phase-l RCT-38 


3 


Phase-l RCT-39 


3 


Phase-l RCT-68 


3 


Phase-1 RCT-89 


3 


Phase-1 RCT-139 


2 


3-hydroxyisobutyrate dehydrogenase 


2 


60S ribosomal protein L6 


2 


Alpha 1 - inhibitor IQ 


2 


Bax (alpha) 


2 


Beta-actin 


2 


Carbonic anhydrase IE 


2 


c-myc 


2 
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Epidermal growth factor 


2 


Equilbrative nitrobenzylthioinosine-sensitive 
nucleoside transporter 


2 


Heme oxygenase 


2 


TT j_! t • 

Hepatic lipase 


2 


ID-l 


2 


Insulin-like growth factor binding protein 3 


2 


Integrin betal 


2 


N-hydroxy-2-acetylaminofluorene sulfotransferase 
(ST1C1) 


2 


Organic anion transporter 3 


2 


Paraoxonase 1 


2 


Til 1 T\ /^T" 1 AO 

Phase- 1 RCT-102 


2 


Til,--.- 1 T> /^T" 1 17 

Phase- 1 KL1-11 / 


2 


rnase-i kui-izj 


Z 


rnase-i KL> 1 - i jZ 


Z 


DL n . fl 1 T> OT 1 TO 

rnase-i xCL.i-1 /y 


Z 


rnase-1 KCL-loy 


Z 


Phase- 1 JKCl-iyi 


2 


Phase- 1 KLI-241 


2 


Phase- 1 RLT-27U 


2 


Phase- 1 RCT-291 


2 


Voltage-dependent anion channel (V dac2) 


Z 


rnase-1 KOl-zyo 


z 


rnase-i ka^j.-4U 


z 


'DVioeia 1 "DOT 1 /Ifi 


Z 


T)l, nnn 1 "DOT 1 /I n 

phase- 1 RCl-4y 


z 


rnase-1 KLl-oJ 


Z 


Kibosomal protein M / 


Z 


Senescence marker protein-30 


2 


60S ribosomal protein L6 (alternate clone 1) 


1 


25-DX 


1 


Ailatoxin Bl aldehyde reductase 


1 


Aldehyde dehydrogenase, microsomal 





Alpha-2-macroglobulin 


j 


Apohpoprotein CHI 


1 


Argininosuccinate lyase 


1 


ATPase inhibitor (rat mitochondrial IFI protein) 




jDCla-lUDUlin, t/JlaSS 1 


T 


Calpactin I heavy chain 




Carbamyl phosphate synthetase I 




Carbonyl reductase 




c-H-ras 




c-jun 
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Cofilin 


1 


Cyclin G 


1 


DNA polymerase beta 


1 


Elongation factor- 1 alpha 


1 


Endogenous retroviral sequence, 5' and 3' LTR 


1 


Enolase alpha 


1 


Extracellular-signal-regulated kinase 1 


1 


Fas antigen 


1 


Gaddl53 


1 


Glucose-regulated protein 78 


1 


IgE binding protein 


1 


IkB~a 


1 


Insulin-like growth factor I 


1 


Liver fatty acid binding protein 


1 


Macrophage inflammatory protein-1 alpha 


1 


Macrophage inflammatory protein-2 alpha 


1 


MAP kinase kinase 


1 


Matrix metalloproteinase-1 


1 


Melanoma-associated antigen ME491 


1 


Monocyte chemotactic protein receptor (CCR2) 


1 


Multidrug resistant protein-1 


1 


Multidrug resistant protein-2 


1 


NADPH quinone oxidoreductase-1 (DT-diaphorase) 


1 


Nucleoside diphosphate kinase beta isoform 


1 


p53 


1 


Phase-1 RCT 252 


1 


Phase-1 RCT-109 


1 


Phase-1 RCT-12 


1 


Phase-1 RCT-127 


1 


Phase-1 RCT-137 


1 


Phase-1 RCT-15 


1 


Phase-1 RCT-154 


1 


Phase-1 RCT-162 


1 


Phase-1 RCT-168 


1 


Phase-1 RCT-181 


1 


Phase-1 RCT-185 


1 


Phase-1 RCT-192 


1 


Phase-1 RCT-205 


1 


Phase-1 RCT-214 




Phase-1 RCT-225 




Phase-1 RCT-239 




Phase-1 RCT-242 




Phase-1 RCT-37 




Phase-1 RCT-55 
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Phase- 1 RCT-65 


1 


Phase- 1 RCT-72 


1 


Phase- 1 RCT-88 . 


1 


Proliferating cell nuclear antigen gene 


1 


Pyruvate kinase, muscle 


1 


Ref-1 


1 


Ribosomal protein LI 3 A 


1 


Ribosomal protein S8 


1 


Ribosomal protein S9 




Sodium/bile acid cotransporter 


1 


Superoxide dismutase Mn 




T-cell cyclophilin 




Thymosin beta-10 




Transthyretin 




Ubiquitin conjugating enzyme (RAD 6 homologue) 




Uncoupling protein 2 





* Combination category is the number of training/test set gene list occurrences. 
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Table 6 Randomly Selected Gene Subsets from 24 hour Combo All Gene Set (142 genes)* 



Rand 5 

Phase-1 RCT-117 

AflatoxinBl aldehyde reductase 
Phase-1 RCT-128 
Insulin-like growth factor I 
Phase-1 RCT-258 



Rand 10 

, Phase-1 RCT-139 

60S ribosomal protein L6 

NADPH quinone oxidoreductase-1 (DT-diaphorase) 

Liver fatty acid binding protein 

MAP kinase kinase 

Melanoma-associated antigen ME491 

Pyruvate kinase, muscle 

Phase-1 RCT-168 

Phase-1 RCT-1 85 

T-cell cyclophilin 



Rand 15 

Phase-1 RCT-192 
Phase-1 RCT-1 91 
Phase-1 RCT-1 5 
Phase-1 RCT-1 89 
Multidrug resistant protein- 1 

Cyclin G 

Urinary protein 2 precursor 
Phase-1 RCT-271 
Phase-1 RCT-185 
Phase-1 RCT-139 
Phase-1 RCT-109 
Phase-1 RCT-55 
Phase-1 RCT-258 
Phase-1 RCT-33 
Argininosuccinate lyase 
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* Genes were randomly selected from the Combo All list of predictive genes (142 genes) 
assigning a random number to each gene, sorting by the random number and selecting the 
appropriate number of sorted genes. 
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Table 7 Randomly Selected Gene Subsets from 
24 hour Combos 5, 4, 3 combined (32 genes)* 



Rand 5 
Phase- 1 RCT-182 
Phase-1 RCT-258 
Phase-1 RCT-38 
Phase-1 RCT-78 

Gadd45 



Rand 10 

Phase-1 RCT-SO 
Phase-1 RCT-213 
Phase-1 RCT-182 

Phase-1 RCT-89 
Dynamin-1 (D100) 

Phase-1 RCT-38 
PAR interacting protein 
Phase-1 RCT-256 
Phase-1 RCT-145 

Phase-1 RCT-39 



Rand 15 

_^ Phase-1 RCT-144 

Phase-1 RCT-1 80 

Phase-1 RCT-33 

Zinc finger protein 

Phase-1 RCT-288. 

Dynamin-1 pi 00) 

Phase-1 RCT-39 

14-3-3 zeta 

Insulin-like growth factor binding protein 1 

Phase-1 RCT-78 

Ornithine decarboxylase 

L-gulono-gamma-lactone oxidase 
Cathepsin L, sequence 2 

Phase-1 RCT-207 

Phase-1 RCT-92 
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* Genes were randomly selected from the Combo 5, 4, and 3 combined list of predictive 
genes (32 genes) assigning a random number to each gene, sorting by the random number and 
selecting the appropriate number of sorted genes. 
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Table 8 Randomly Selected Gene Subsets from 
24 hour All- Predictive (Nonpredictive) Genes 



AU-Pred 10 genes 

Phenylalanine hydroxylase 

Colony-stimulating factor- 1 

Ciliary neurotrophic factor 

Ribosomal protein LI 3 

S-adenosylmethionine decarboxylase 

Notch 1 

Phase-1 RCT-91 

CTP:phosphocholine cytidylyltransferase 

Cytochrome P450 1 Al 

Phase-1 RCT-60 



AU-Pred 5 genes 

Cellular nucleic acid binding protein (CNBP) 

VL30 element 

Hemoglobin alpha 1 chain (alternate clone) 

Complement component C3 

Thrombomodulin 



All-Pred 15 genes 

Nucleosome assembly protein 

Neutral endopeptidase24.11 (enkephalinase) 

CyclinDl 

Lactate dehydrogenase-B 

Selenoprotein P 

Clusterin 

Biliverdin reductase 

Phase-1 RCT-79 

Caspase 3 

Adrenomedullin 

Ribosomal protein LI 3 

Cytochrome P450 2C39 (alternate clone 2) 

Phase-1 RCT-277 

Carnitine palmitoyl-CoA transferase 
Cytochrome P45Q2D18 



Page 21 of 21 



WO 03/085083 PCT/US03/10141 



Table 9 Liver Toxicity Individual Sample Prediction Values for 24 Hour Data 
Predictive Genes (Combined List and Subsets) 





Number 






Prediction Measure* 




Gene Set 


of Genes 


Accuracy*^ 


False Positive* 1 


false Negative** 


Geom i etric Mca n* * 


Combo All 


' . 1.4 ? ' " 






.70JP5iQ-9PP .:9- ? 33)7 




Combo 5 


3 . ....... 






9 • P?-%.(P pop 7: P: > 


6^90 1 ^(0 868 0.94 1 ) 


Combo 4 


7 . . ... 






6.1 28 "(0.083*" 0.222) 


^^862^0 BOO ~-"6!?1 5) 


Combo 3 


\'.... 2 ^..l. 






,.°:P.50jP:PP0 - 0.083)_ 


7o.?00 tj6iM^° 9? 4 > 


Combo 2 


36 


..^ 






~P^P^iP-7P?7 P: 9 ^) 


Combo 1 


74 


0.894 (0.853 -0.94 1J 


. -PI 1(PP '-9A 




b.84~4 (6.705 - 0^949) 



* Prediction measures are given as means and range of values (in parentheses) for five 

training/test sets using 24 hour array data and gene lists as presented in Table . Unit of 

prediction was the animal and the predictive classification was for liver necrosis observed at 
72 hours after treatment. 



** Standard prediction measures were used as defined in Materials and Methods. These 
include: 



Accuracy 
False positive rate 
False negative rate 
Geometric mean 



proportion of total number of predictions that are correct 
proportion of negative cases that are incorrectly classified as positive 
proportion of positive cases that are incorrectly classified as negative 
performance measure that takes into account proportion of positive and 
negative cases 
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Table 10 Liver Toxicity Compound-Dose Prediction Values for 24 Hour Data Predictive 
Genes (Combined List and Subsets) 



Gene Set 


No, 
Genes 


Prediction Measure* 




Accuracy 


** 




False Positive** 


False Negative 


** 


Geometric Mean** 


Combo All 


142 


0.935 


( 0.879 


-0.971 ) 


0.066 


( 0.032 


. 0.100 ) 


0.067 ( 0.000 - 


0,333 ) 


0.932 


( 0,775 - 


0.984 ) 


Combo 5 


3 


0.941 


( 0.879 


- 1.000 ) 


0.065 


( 0.000 


- 0.133 ) 


0,000 ( 0.000 - 


0.000 ) 


0.966 


(0.931 - 


1,000 ) 


Combo 4 


7 


0.912 


( 0,879 


- 0.943 ) 


0.085 


( 0.032 


- 0.133 ) 


0.117 ( 0.000 - 


0.333 ) 


0.895 


( 0,775 - 


0.967 ) 


Combo 3 


22 


0.905 


( 0.758 


-0.971 ) 


0.105 


( 0.032 


- 0.267 ) 


0.000 ( 0.000 - 


0.000 ) 


0.945 


( 0.856 - 


0,984 ) 


Combo 2 


36 


0.947 


( 0.909 


-0.971 ) 


0.059 


( 0.032 


- 0.100 ) 


0.000 ( 0.000 - 


0.000 ) 


0.970 


( 0.949 - 


0.984 ) 


Combo 1 


44 


0.936 


( 0.879 


-0.971 ) 


0.065 


( O.032 


- 0.100 ) 


0.067 ( 0.000 - 


0.333 ) 


0.932 


( 0.775 - 


0.984 ) 



* Prediction measures are given as means and range of values (in parentheses) for five 
training/test sets using 24 hour array data and gene lists as presented in Table 35 and Table 5. 
Unit of prediction was compound-dose level and the predictive classification was for liver 
necrosis observed at 72 hours after treatment. Prediction for compound-dose was based on a 
majority of individual animal calls. In cases where there were an equal number of opposing 
calls or no calls a no-call was assigned to the compound-dose level. 

** Standard prediction measures were used as defined in Materials and Methods. As 
described in Materials and Methods in cases where no prediction was made because the p- 
value ratio exceeded the cutoff- value (generally 0.5) the non-call was considered to be 
incorrect. 



Page 23 of 23 



WO 03/085083 



PCIYUS03/10141 



Table 1 1 Liver Toxicity Compound Prediction Values for 24 Hour Data Predictive Genes 
(Combined List and Subsets) 



Gene Set 


Number 
of Genes 


Prediction Measure* 


Accuracy** 




False Positive** 


False Negative** 


Geometric Mean** 




Corrbo All 


142 


0.937 


(0.842 


- 1000) 


0.063 


( 0.000 - 


0.125 ) 


0.067 


( 0.000 - 


0.333 ) 


0.934 


( 0.764 - 


1.000 ) 


Combo 5 


3 


0.947 


( 0.895 


- 1.000 ) 


0.063 


( 0.000 - 


0.125 ) 


0.000 


( 0.000 - 


0.000) 


0.968 


( 0.935 - 


1.000) 


Combo 4 


7 


0.926 


(0.842 


- 1.000 ) 


0.063 


( 0.000 - 


0.125 ) 


0.133 


( 0.000 - 


0.333) 


0.898 


( 0.764 - 


1.000) 


Combo 3 


22 


0.937 


(0.842 


- 1.000) 


0,075 


( 0.000 - 


0.188) 


0.000 


( 0.000 - 


0.000 ) 


0.961 


( 0.901 - 


1.000) 


Combo 2 


36 


0.958 


( 0.895 


- 1.000) 


0.050 


( 0.000 - 


0.125 ) 


0.000 


( 0.000 - 


0.000 ) 


0.974 


( 0.935 - 


1.000 ) 


Combo 1 


44 


0.947 


(0.842 


- 1.000) 


0.050 


( 0.000 - 


0.125 ) 


0.067 


( 0.000 - 


0.333) 


0.940 


( 0.764 - 


1.000) 



* Prediction measures are given as means and range of values (in parentheses) for five 
training/test sets using 24 hour array data and gene lists as presented in Table 35 and Table 5. 
Unit of prediction was the compound and the predictive classification was for liver necrosis 
observed at 72 hours after treatment. Compounds were considered toxic if any compound- 
dose level for that compound was predicted as toxic. 

** Standard prediction measures were used as defined in Materials and Methods. As 
described in Materials and Methods in cases where no prediction was made because the p- 
value ratio exceeded the cutoff-value (generally 0.5) the non-call was considered to be 
incorrect. 



; 
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Table 12 Individual Gene Predictions: Combo 5 



vjcnc jMoinc 


Overall Correct Call 






Mean 


s.d. 


min 


max 


Gamma-actin, cytoplasmic 


89.2 


4.5 


84.0 


95.1 


Matrin F/G 


82.0 


8.1 


74.8 


91.1 


RCT-078 


76.5 


18.8 


50.5 


91.1 


Average Individual Combo 5 


82.6 


10.5 


69.8 


92.4 


Minimum Individual Combo 5 


76.5 


4.5 


50.5 


91.1 


Maximum Individual Combo 5 


89.2 


18.8 


84.0 


95.1 



Table 13 Individual Gene Predictions: Combo 4 



Gene Name 


Overall Correct Calls (%) 




Mean 


s.d. 


min 


max 


Gadd45 


80.9 


8.7 


70.2 


92.1 


Cathepsin L 


77.3 


9.1 


67.0 


90.0 


Zinc Finger Protein 


85.1 


9.6 


70.2 


93.1 


RCT-144 


90.6 


2.8 


86.5 


94.1 


RCT-145 


77.6 


5.3 


69.1 


82.3 


RCT-50 


69.0 


8.3 


58.7 


80.6 


RCT-92 


85.0 


4.3 


80.4 


89.4 


Average Combo 4 


80.8 


6.9 


71.7 


88.8 


Minimum Individual Combo 4 


69.0 


2.8 


58.7 


80.6 


Maximum Individual Combo 4 


90.6 


9.6 


86.5 


94.1 
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Gene Name 


Overall Correct Calls (%) 




Mean 


S.fl. 


min 


max 


14-3-3 zeta 


11 8 




00. u 


o4.2 


Dynamin-1 (D100) 


D l.U 


1 O A 




81.7 


Insulin- like growth factor binding 
protein 1 


73.6 


4.8 


69.2 


81.4 


L-gulono-gamma-lactone oxidase 


89 1 
oZ. 1 


1£ 7 
10. / 


^9 A 


Q1 1 


Ornithine decarboxylase 


7^ ^ 

/ j, D 


1 1 £ 
X 1 .0 




8A ^ 


PAR interacting protein 


R\ R 

Ol.O 


^ 0 
J. 


77 Q 


88 9 
oo.Z 


Phase-1 RCT-128 


S8 9 


94 Q 


97 7 
z /. / 


8^ £ 
OJ.O 


Phase-1 RCT-180 


69 ^ 


7 9 
/ .z 


^9 1 


71 £ 
/ 1 .0 


Phase-1 RCT-182 


S6 1 


7 

JV. / 


98 J. 


on a 


Phase-1 RCT-207 


77 0 


7 7 


OXO 


80 A 


Phase-1 RCT-213 


7zL ^ 


A 8 


£8 1 
Oo.l 


8fl 8 
oU.o 


Phase-1 RCT-256 


o / , j 


90 9 
ZU.Z 


Al < 


8£ 1 
00. l 


Phase-1 RCT-258 


oi.J 


7 1 
/.I 


71 7 


OO.j 


Phase-1 RCT-264 




9£ 7 
Z0. / 


98 7 
Zo. / 


88 A 


Phase-1 RCT-271 


6^ 9 


98 < 


*XA n 


01 1 


Phase-1 RCT-288 


65 3 


30 4 


26 7 


RR $ 

OO. J 


Phase-1 RCT-33 


69.4 


27.0 


38.6 


90.4 


Phase-1 RCT-36 


77.2 


27.6 


27.9 


91.1 


Phase-1 RCT-38 


57.9 


21.5 


37.2 


83.2 


Phase-1 RCT-39 


78.4 


8.5 


71.6 


93.1 


Phase-1 RCT-68 


67.5 


13.2 


52.5 


88.5 


Phase-1 RCT-89 


69.5 


16.1 


45.2 


86.2 


Average Individual Combo 3 


69.8 


16.7 


48.7 


86.5 


Minimum Individual Combo 3 


51.0 


3.9 


26.7 


71.6 


Maximum Individual Combo 3 


82.1 


30.7 


77.9 


93.1 
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Table 15 Liver Toxicity Compound-Dose Prediction Values for 24 Hour Data with 
Random Gene Subsets 



Gone Set 
Combo All 
Combo All 
Combo All 

Combo 5 4 3 
Combo 5 4 3 
Combo 5 4 3 




Prediction Measure* 



Accuracy** 

M? \ (0-750- 0j[86) 
6.889(0.857-0.914) 



0.886(0.719-0,972) 
^§^7^2^07941)" 
0.824 (0"8-0.844)"" 



False Positive** 



Oil 74 (0.129-0^58) 
0.123 (0 : 097-0 : 1_61J^ 



0 <1 124J0.031-0.300) 
67l 2?J0]O65^p. 1 94jT 
6.181 (0.161-0.194)^ 



False Negative* 

_ " 6.3 (0-1,6) 



_0 

Ojlf (0-0™333)' 



Geometric Mean** 
'67936J0^ 
y'T.670(6"^6"933j 



P:?_ 3 _? .19..? 1 6 :p950) 



0^34 (0.837-0.984) 
0.933 (6!)398-0\9G7) ' 
0.847 "(0748-6.913) ' 



1-2. 



Randomly selected sets of genes derived from the Combo sets are described in Tables 



* Prediction measures are given as means and range of values (in parentheses) for five 
training/test sets using 24 hour array data and random subsets of genes as presented in Table 
35, Table 6, and Table 7. Unit of prediction was compound-dose and the predictive 
classification was for liver necrosis observed at 72 hours after treatment. 



** Standard prediction measures were used as defined in Materials and Methods. As 
described in Materials and Methods in cases where no prediction was made because the p- 
value ratio exceeded the cutoff-value (generally 0.5) the non-call was considered to be 
incoiTect. 
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Table 16 Comparison of Predictivity for Correct Liver Toxicity Classification and Random 
Classification Using Combo Gene Sets and Random Subsets and 24h data 



Combo All y 


\\\ Genes 


92.4 


( 


87.2- 


96.0) 




26.8 


( 


20.4- 


35.6) 




5 genes 


83.9 


( 


80.1- 


86.1) 




29.9 


( 


15.5- 


49.0) 




10 genes 


78.0 


( 


74.5- 


81.6) 




28.2 


( 


18.1- 


33.0) 




Accuracy 










Accuracy 




Gene List* 


Gene Subset* 


Correct Classification** 


Random Classification** 






Mean 




Min - 


Max 


Mean 




Mln. - 


Max. 




1 5 genes 


fifi 0 

oo.z 


\ 


7ft 7. 


90.3) 




25.0 




17.0- 


36.5) 


v^orriDO o *f o 




91.0 


( 


86.2- 


95) 




27.4 


( 


18.1- 


36.6) 




5 genes 


79.9 


( 


72.8- 


91.3) 




29.4 




26.5- 


34.0) 




10 genes 


84.7 


( 


81.6- 


90.1) 




28.2 




24.0- 


32.7) 




15 genes 


82.5 


( 


72.3- 


91.3) 




25.0 




13.8- 


30.8) 


Combo 5 


All Genes 


89.6 


( 


83.7- 


96.1) 




25.2 




22.8- 


28.2) 


Combo 4 


All Genes 


85.7 


( 


79.6- 


91.3) 




26.6 




17.0- 


41.3) 


Combo 3 


All Genes 


86.5 


( 


75.5- 


92.3) 




27.5 




21.3- 


34.7) 


Combo 2 


All Genes 


91.2 


( 


85.1- 


95.0) 




23.9 




18.1- 


29.8) 


Combo 1 


All Genes 


89.4 


( 


85.3- 


94.1) 




24.2 




18.8- 


34.0) 


All - Predict 


5 genes 


47.4 


( 


34.3- 


63.5) 




25.3 




19.8- 


30.8) 




10 genes 


67.4 


( 


59.2- 


80.1) 




24.7 




14.6- 


30.9) 




15 genes 


45.9 


( 


32.7- 


63.5) 




26.5 




15.8- 


33.3) 



* Combo Gene Lists as in Example 1, Table 1. For Combo lists all genes were used 
or random subsets as in Tables 1-3. All-Pred used genes randomly selected from 
genes that were present on the array but not in the predictive list. 

** Accuracy = proportion of the total number of predictions that are correct. Non- 
calls are counted as incorrect predictions. Accuracy was calculated for correct 
classifications of liver toxicity assigned to the samples and for randomized 
classifications in the same proportions as the correct classifications. Values presented 
are the mean accuracy values for 5 training/test sets with minimum and maximum 
accuracy values. 
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Table 17 Distribution of Compounds* in Individual Training 
and Test Sets for 6 Hour Liver Data 



Training and Test Set 1 



l raining oei i 


1 raining bet i 


Tnnf Oaf 1 

lest oet l 


lest oet 1 


iNegauve 


rosmve 


iNegaiive 


Positive 


5-rU 


Ar LB 


"DT TO 

nub 


APAP 


A TV /TOT) 

AMrB 


A XTTT 1 

AN IT 


CiiCJLi 


CCL4 


AZA 


TOUT) 

BRB 


CxibX 


TET 


DAD 

BAr 


UAL) 






BbJN 




UrrlUb 




CAK 


T "DC 

Lro 


CjAIV 




UiiLUK 




xli JJ 












CJJJ 




JVLlil 












V./ 1 L/A 




T)T rp 
rUK 




TYCY 




1 AJVX 




r\TT? 

JJir 








JJUA 
















EST 








ETH 








GEN 








KETO 








NAL 








PBARB 








PHEN 








QUEST 








STRZ 








THEO 








Training and Test Set 2 


Training Set 2 


Training Set 2 


Test Set 2 


Test Set 2 


Negative 


Positive 


Negative 


Positive 


AMPB 


APAP 


5-FU 


AFLB 


AZA 


BRB 


BEN 


ANIT 


BAP 


CCL4 


CHCL3 


CAD 


BUS 


DMN 


CHLOR 




CAR 


LPS 


CIS 




CHEX 


TET 


CLOZ 




CLO 




CMC 
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TYFY 








nre? 

JJJLT 
















PPV 

JCivI 




XT AT 




Co 1 




nTTTM 




FTH" 
Din 
















npw 








n i u 
















MPT 

JLYJL.C 1 
















PEG 








PHEN 








PUR 








STRZ 








TAM 








THEO 









Training and Test Set 3 



Training Set 3 


Training Set 3 


Test Set 3 


Test Set 3 


Negative 


Positive 


Negative 


Positive 


AMPB 


AFLB 


5-FU 


CCL4 


AZA 


ANIT 


CHEX 


DMN 


BAP 


APAP 


CMC 


LPS 


BEN 


BRB 


CPHOS 




BUS 


CAD 


CYCA 




CHCL3 


TET 


DOX 




CHLOR 




MET 




CIS 




NAL 




CLO 




PHEN 




CLOZ 




PUR 




DEX 




QUIN 




DIF 




THEO 




ERY 








EST 








ETH 








GAN 








GEN 








HYD 








ISON 
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JVC I \J 








PBARB 








PEG 








STRZ 








TAM 








CAR 









Training and Test Set 4 



Trninina Q^f A 


TVoi-ni-n rr Qe*i A 

i raining oei 


l esc oet 4 


Ton* C ~+ A 

i est oet 4 




IT OolllVC 


IN Cgall VO 


Prt tro 

irositive 


C rjTT 
J-JT U 






ArLr> 


ATV/TPP 
AiVJJrx3 


AP AP 
ArAr 


"DAP 


npT A 


"RT7XT 


r>IVD 




JLJtil 


T3T TQ 


p An 


PT A7 




PAP 


TYN/TM 


PlTT? 
Ulr 






T PQ 










PTTT 
J! in 




PTC 








PT O 




pep 




CMC 




r n r*ji m 




CPHOS 




PUR 




CYCA 




QUIN 




DEX 








ERY 








EST 








GAN 








GEN 








ISON 








KETO 








MET 








NAL 








PBARB 








STRZ 








TAM 








THEO 








Training and Test Set 5 


Training Set 5 


Training Set 5 


Test Set 5 


Test Set 5 


Negative 


Positive 


Negative 


Positive 


5-FU 


AFLB 


AZA 


BRB 
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AMPB 


ANIT 


BAP 


DMN 


BEN 


APAP 


BUS 


LPS 


CAR 


CAD 


CLO 




CHCL3 


CCL4 


CLOZ 




CHEX 


TET 


DOX 




CHLOR 




GEN 




CIS 




MET 




CMC 




NAL 




CPHOS 




PEG 




CYCA 




PHEN 




DIF 




QUIN 




ERY 








EST 








ETH 








GAN 




• 




HYD 








ISON 








KETO 








PBARB 








PUR 








STRZ 








TAM 








THEO 








DEX 









* For abbreviations please see Table 1 (Compound, Dose, Abbreviation, etc.) 
** Negative^ Compounds that did not elicit histopathology (score=l) 
Positive= Compounds that did elicit histopathology (score of 2 or greater) 
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Table 18 List of Genes, Whose Expression at 6 h Directly Correlates with Liver 
Hepatocellular Necrosis at 72h, Ranked by Pearson Correlation Coefficient 



VjCUC 


Correlation 
Coefficient 


Alpha-tubulin 


0.6309915 


Superoxide dismutase Mn 


0.6104141 


Cathepsin L 


0.6078458 


Gadd45 


0.5948032 


ID-1 


0.5895025 


Argininosuccinate lyase 


0.5767352 


c-fos 


0.5752904 


Beta-actin, sequence 2 


0.5710737 


c-H-ras 


0.5661596 


Phase-1 RCT-211 


0.5653724 


Thymosin beta-10 


0.5610229 


Gaddl53 


0.5517636 


Uncoupling protein 2 


0.5497988 


Heme binding protein 23 


0.5460188 


Ribosomal protein LI 3 A 


0.5443944 


alpha- 1 ,2-fucosyltransferase 


0.543632 


Aldehyde dehydrogenase 2 


0.5385723 


Phase-1 RCT-50 


0.5211563 


Phase-1 RCT-109 


0.5203465 


Ecto-ATPase 


0.5152093 


Phase-1 RCT-24 


0.5125332 


Bax (alpha) 


0.5095243 


Phase-1 RCT-12 


0.5075572 


Bcl-2 


0.5068672 


Phase-1 RCT-49 


0.5036029 


Beta-tubulin, class I 


0.4991521 


Calreticulin 


0.4985017 


Multidrug resistant protein-3 


0.4938303 


ADP-ribosylation factor-like protein ARL184 


0.490394 


Transferrin 


0.4883213 


Cathepsin L, sequence 2 


0.4877807 


Diacylglycerol kinase zeta 


0.4854465 


Gamma-glutamyl transpeptidase 


0.4848459 


Phase-1 RCT-111 


0.4843905 


14-3-3 zeta 


0.4822279 


Dynein light chain 1 


0.4804166 


Insulin-like growth factor binding protein 1 


0.479885 



Page 33 of 33 



WO 03/085083 



PCT/US03/10141 



Phase-1 RCT-281 


0.475073 


Thiol-specific antioxidant (natural killer cell-enhancing 
factor B) 


0.4740617 


Cyclin dependent kinase 4 


0.46983 


Phase-1 RCT-68 


0.4668504 


Phase-1 RCT-144 


0.4655095 


MHC class I antigen RTl.Al(f) alpha-chain 


0.4641322 


c-jun 


0.4638237 


Macrophage inflammatory protein-2 alpha 


/*> A C A A S" S* f\ 

0.4544662 


Superoxide dismutase Cu/Zn 


0.448226 


Stathmin 


0.4478989 


Phase-1 RCT-179 


r\ a A n O F A 

0.447854 


Phase-1 RCT-103 


0.447565 1 


Insulin- like growth factor binding protein 5 


0.4431518 


Matrix metalloproteinase-1 


0.4405304 


Pyruvate kinase, muscle 


A A A AO ^ HO 

0.4402392 


Glyceraldehyde 3-phosphate dehydrogenase 


0.4401 766 


Hypoxanthine-guanine phosphoribosyltransferase 


0.4357165 


Phase-1 RCT-221 


0.4341553 


Cyclin E 


0.4337104 


Peroxisomal 3-ketoacyl-CoA thiolase 2 


A A O AO A O A 

0.4302424 


Phase-1 RCT-27 


0.4273592 


Sorbitol dehydrogenase 


A A*tACCHC\ 

0.4245579 


Phase-1 RCT-198 


A ><0 *"» OOZ^A 

0.4232769 


Phase-1 RCT-43 


0.4216533 


Ornithine decarboxylase 


A /I O 1 zT AO A 

0.42 1 6079 


Alpha-fibrinogen 


a vto 1 f on/ 

0.42 15 826 


Phase-1 RCT-53 


0.4214711 


Phase-1 RCT-147 


/\ vIO 1 /I i ZTO 

0.4214167 


Peroxisomal 3-ketoacyl-CoA thiolase 1 


A vl 1 O^T 1 O vt 

0.4176134 


Voltage-dependent anion channel 2 (Vdac2) 


A /in/1 /*TC 

0.4174675 


Glutathione reductase 


0 41 ^74Q6 


Tryptophan hydroxylase 


0.4123288 


Phase-1 RCT-240 


0.4111351 


Zinc finger protein 


0.4091316 


Phase-1 RCT-228 


0.4057419 


Phase-1 RCT-14 


0.4046313 
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Table 19 List of Genes, Whose Expression at 6 h Inversely Correlates with Liver 
Necrosis at 72h, Ranked by Spearman Correlation Coefficient 



- 


Correlation 


Gene 


Coefficient 


Phase-1 RCT-36 


-0.1515 


Phase-1 RCT-92 


-0.15161 


Phase-1 RCT-143 


-0.15557 


Sarcoplasmic reticulum calcium ATPase 


-0.15628 


Cyclin dependent kinase 2 


-0.15633 


Gamma-glutamyl transpeptidase 


-0.15636 


Phase-1 RCT-285 


-0.1569 


Phase-1 RCT-292 


-0.15855 


Carbamyl phosphate synthetase I 


-0.15946 


Monoamine oxidase B 


-0.16151 


Phase-1 RCT-61 


-0.16227 


3-hydroxyisobutyrate dehydrogenase 


-0.1642 


Cytochrome P450 2C11 


-0.16488 


Phase-1 RCT-164 


-0.16502 


Vesicular monoamine transporter (VMAT) 


-0.1656 


Aspartate aminotransferase, mitochondrial 


-0.16581 


Axin 


-0 16584 

V* 1 WWW ¥ 


Phase-1 RCT-13 


-0.16715 


N-hydroxy-2-acetyIaminofluorene 
sulfotransferase (ST1 C1 ) 


-0.16724 


Oxygen regulated protein 150 


-0.16861 


Phase-1 RCT-177 


-0.17261 


Diacylglycerol kinase zeta 


-0.17336 


Very long-chain acyl-CoA synthetase 


-0.17338 


Phase-1 RCT-277 


-0.17348 


Phase-1 RCT-256 


-0.17456 


Equilbrative nitrobenzylthioinosine-sensitive 
nucleoside transporter 


-0.17592 


H-rev107 


-0.17721 


PTEN/MMAC1 


-0.17816 


Phase-1 RCT-289 


-0.17819 


Phase-1 RCT-271 


-0.17868 


Cyclin D3 


-0.17914 


Phase-1 RCT-280 


-0.17953 


Phase-1 RCT-209 


-0.18117 


Malate dehydrogenase, cytosoiic 


-0.18371 


Extracellular-signal-requlated kinase 1 


-0.1844 
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NADH-cytochrome b5 reductase 


-0.18481 


Phase-1 RCT-288 


-0.18493 


Phase-1 RCT-82 


-0.18497 


Phase-1 RCT-10 


-0.18613 


Organic anion transporter 3 


-0.18615 


Phase-1 RCT-52 


-0.18746 


Phase-1 RCT-287 


-0.19026 


Carbonic anhydrase If 


-0.19132 


Complement component C3 


-0.1918 


Protein tyrosine phosphatase alpha 


-0.1925 


Aldehyde dehydrogenase, microsomal 


-0.19284 


D-dopachrome tautomerase 


-0.19309 


Phase-1 RCT-218 


-0.19413 


Phase-1 RCT-89 


-0.19423 


Cytochrome P450 1A2 


-0.19844 


Phase-1 RCT-173 


-0.20095 


Phase-1 RCT-119 


-0.20097 


Matrin F/G 


-0.20244 


Phase-1 RCT-102 


-0.20574 


Cyclin dependent kinase 4 


-0.20718 


Hydroxysteroid sulfotransferase a 


-0.20766 


Lysyl hydroxylase 


-0.20785 


Phase-1 RCT-1 84 


-0.2098 


ft-OYnnuaninA HWA nlv/pn^vla^p 


-0 91 9Q 


.INK1 <;trp^^ antivafpri nmtpin kina^p 

UIHi\ 1 Oil COO QV^U VQl^U yj I \J l^ll 1 ICtOC 




r^lutpminp ^v/nthpfa^p 


-f) 91 45 


Pha<5P-1 RPT-9Q1 


-D 9179 


w CIUCI lUOjII 1 IC7U (HJI III (O UOval UL/AylCIoC? 


~f) 99017 

-\J.*L.C.\J \ { 


NADP-dependent isocitrate dehydrogenase, 




cytosolic 


-0.22819 


Phase-1 RCT-1 82 


-0.2292 


DNA topoisomerase I 


-0.23083 


Selenoprotein P 


-0.23114 


C4b-binding protein 


-0.23274 


Alcohol dehydrogenase 1 


-0.23292 


Phase-1 RCT-83 


-0.23342 


Phase-1 RCT-78 


-0.23557 


17-beta hydroxysteroid dehydrogenase, type 2 


-0.23694 


Sterol carrier protein 2 


-0.23977 


Iron-responsive element-binding protein 


-0.24103 


Peroxisomal multifunctional enzyme type II 


-0.24167 


Phase-1 RCT-1 68 


-0.24388 


Phase-1 RCT-270 


-0.24473 


3-beta-hydroxysteroid dehydrogenase (HSD3B1) 


-0.25101 


Acetyl-CoA carboxylase 


-0.2543 


Emerin 


-0.25719 


Phase-1 RCT-73 


-0.26044 
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Nucleosome assembly protein 


-0.26213 


Cytochrome P450 2E1 


-0.26809 


Thymidvlate synthase 


-0.27492 


Phase-1 RCT-161 


-0.28042 


Cholesterol 7-aloha-hydroxylase (P450 VII) 


-0.28206 


Phase-1 RCT-40 


-0.28754 


Stem cell factor 


-0.28765 


Glucokinase 


-0.30523 


Tryptophan hydroxylase 


-0.30775 


Phase-1 RCT-214 


-0.31173 


Carbonic anhydrase III 


-0.31836 


Senescence marker protein-30 


-0.37821 
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Table 20 List of genes whose expression at 6 hours is predictive 
of liver toxicity at 72 hours 



Gene Name 


Combination 


Category* 


Areininosuccinate lyase 


5 


Cathepsin L, sequence 2 


5 


c-myc 


5 


Gaddl53 


5 


Gadd45 


5 


Heme oxygenase 


5 


Insulin-like growth factor binding protein 1 


5 


N1PK 


5 


Phase- 1 RCT-207 


5 


Phase-l RCT-50 


5 


Alpha-2-macroglobulin, sequence 2 


4 


c-jun 


4 


Phase-l RCT-127 


4 


Phase-l RCT-242 


4 


Phase-l RCT-82 


4 


Pyruvate kinase, muscle 


4 


Zinc finger protein 


4 


Cyclin dependent kinase 4 


3 


Focal adhesion kinase (ppl25FAK) 


3 


Glucokinase 


3 


Inteexin betal 

o 


3 


Interferon related developmental regulator 1FKD1 


•3 


(PC4) 




NGF-inducible antiproliferative putative secreted 




protein (PC3) 




Peroxisomal multifunctional enzyme type II 


3 


Phase-l RCT-18 


3 


Phase-l RCT-49 


3 


Phase-l RCT-59 


3 


Phase-l RCT-72 


3 


Phase-l RCT-75 


3 


Proliferating cell nuclear antigen gene 


3 


Sarcoplasmic reticulum calcium ATPase 


3 


Senescence marker protein-30 


3 


14-3-3 zeta 


2 


Acetyl-CoA carboxylase 


2 


Activating transcription factor 3 


2 
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C4b-binding protein 


2 


Carbonic anhydrase HI 


2 


/-><1 < . t >~1 11 1 J 1 /~T\ A C f\ X T 1 T~\ 

Cholesterol 7-alpha-hydroxylase (P450 VIT) 


2 


Cytochrome P450 l Al 


2 


DNA topoisomerase I 


2 


Ferritin H-chain 


2 


ID-1 


2 


T * 1 ±. 1 * J' j » 

Iron-responsive element-binding protem 


2 


Macrophage inflammatory protein- 1 alpha 


2 


Nucleosome assembly protein 


2 


Phase- 1 RCT-110 


2 


rnase-1 KL 1-123 


2 


Pnase-1 KL1-15 


2 


rnase-1 KUl-loy 


<-> 
2 


rnase-i k^i-i / / 


2 


DVtoo^k 1 "D PT 170 

rnase-1 KU1-1 ly 


2 


rnase-i Kul-lo2 


2 


rnase-1 KCl-iyv 


2 


til--- 1 Tj/^nn o 1 /f 

Phase- 1 K.C1-214 


2 


Phase- 1 KCl-o5 


2 


Phase- 1 KOI -71 


2 • 


rnase-l KCl-liy 


1 


3-beta-hydroxysteroid dehydrogenase (HSD3B1) 


1 


8-oxoguanine DNA glycosylase 


1 


Alcohol dehydrogenase 1 


1 


ATI 

Al-J 


1 


Carnitine palmitoyl-CoA transferase 


1 


Caspase 6 


1 


Chohne kinase 


1 


,-,1* fxn 

Cyclm L>3 




oytocnrome rnou ZJii 




i 


Elongation factor- 1 alpha 




ri-revl07 




1 


Insulin-like growth factor binding protein 5 


1 


Matrix metalloproteinase-1 


1 


Melanoma-associated antigen ME491 


1 


MHC class 1 antigen Kl l.Al(t) alpna-cnain 


1 


Neuropeptide Y 




rndse- 1 ssx^ i - 1 yjy 


1 


Protein O-mannosyltransferase 1 (Pomtl) 




Phase-1 RCT-144 




Phase-1 RCT-191 




Phase-1 RCT-20 




Phase-1 RCT-204 





Page 39 of 39 



WO 03/085083 



PCTAJS03/10141 



DL n(lfl 1 "DOT 1 111 

l^nase-l Ka^ 1-221 


1 


T)L n „- 1 DPT OOC 

rnase- 1 1-225 


i 


D"L n « fl 1 DPT OT7 


i 


rnase- 1 KCl-24o 


1 


rnase- 1 Kv^i-2/U 


. 1 


rnase- 1 i -2 / / 


1 


PViacja 1 PPT OC7 

rnase-i Kv^i-2o/ 


1 


rnase- 1 ssx^ i -Zoi? 




PVmc*» 1 DPT 





Phacp 1 PPT AO 


; 


PliacA 1 PPT 




1 


PViaco 1 PPT 7A 




PViae^ 1 PPT 11 

rnase- 1 


T — ' 

1 


PfiflQp-l PPT-87 




Preproalbumin 




Protein kinase C alpha 




Ribosomal protein LI 3 A 




Selenoprotein P 




Tryptophan hydroxylase 





* Combination category is the number of training/test set gene list occurrences. 
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Table 21 Liver Toxicity Compound-Dose Prediction Values for 6 Hour Data Predictive 
Genes (Combined List and Subsets) 





Number 


Gene Set 


of Genes 


Combo All 


98 


Combo 5 


10 


Combo 4 


7 


Combo 3 


15 


Combo 2 


24 


Combo 1 


42 



Prediction Measure* 


Accuracy** 


False Positive** 


False Negative** 


Geometric Mean** 


0.712(0.610-0.833) 
0.684 (0.597-0.756) 
0.667 (0.623-0.756) 
0.646 (0.534-0.704) 
0.684 (0.571-0.833) 
0.618(0.494-0.846) 


0.290 (0.100-0.431) 
0.329 (0.186-0.477) 
0.329(0.200-0.431) 
0.363 (0.254-0.508) 
0.308(0.100-0.462) 
0.367 (0.086-0.569) 


0.317 (0.000-0.750) 
0,283 (0.000-0.750) 
0.375 (0.000-0.625) 
0.317 (0.083-0.500) 
0.400 (0.000-0.750) 
0.500 (0.167-0.750) 


0.669 (0.474-0.804) 
0.663 (0.451-0.794) 
0.626 (0.545-0.754) 
0.648 (0.598-0.722) 
0.613 (0.474-0.744) 
0.526 (0.385-0.617) 
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Table 22 Comparison of Predictivity for Correct Liver Toxicity Classification and Random 
Classification Using Combo Gene Sets 6 h data. 







Accuracy 








Accuracy 






Gene List* 


Gene Subset* 


Correct 
Classification** 
Mean Min - 


Max 




Random 
Classification** 
Mean Min. 


- Max. 




/"I * All 

Combo All 


All Genes 


0.712 


( 


0.610- 


0.833 


) 




0.199 


( 


0.103 


- 0.282 


) 


Combo 5 


All Genes 


0.684 


( 


0.597 - 


0.756 


) 




0.221 


( 


0.090 


- 0.288 


) 


Combo 4 


All Genes 


0.667 


( 


0.623 - 


0.756 


) 




0.231 


( 


0.090 


- 0.366 


) 


Combo 3 


All Genes 


0.646 


( 


0.534 - 


0.704 


) 




0.233 


( 


0.143 


- 0.324 


) 


Combo 2 


All Genes 


0.684 


( 


0.571 - 


0.833 


) 




0.244 


( 


0.192 


- 0.366 


) 


Combo 1 


All Genes 


0.618 


( 


0.494 - 


0.846 


) 




0.232 


( 


0.128 


- 0.273 


) 



* Combo Gene Lists as in Example 1, Table L For Combo lists all genes were used 
for prediction. 



** Accuracy = proportion of the total number of predictions that are correct. Non- 
calls are counted as incorrect predictions. Accuracy was calculated for correct 
classifications of liver toxicity assigned to the samples and for randomized 
classifications in the same proportions as the correct classifications. Values presented 
are the mean accuracy values for 5 training/test sets with minimum and maximum 
accuracy values. 
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Table 23 Distribution of Compounds* in Individual Training 
and Test Sets for 72 Hour Liver Data 



Training and Test Set 1 



Training Set 1 


Training Set 1 


Test Set 1 


Test Set 1 


Negative 


Positive 


Negative 


Positive 


5-FU 


APAP 


BUS 


AFLB 


AMPB 


BRB 


CIS 


ANIT 


AZA 


CCL4 


CLOZ 


DMN 


BAP 


LPS 


CMC 




BEN 


TET 


DEX 




CAD 




DIF 




CAR 




ERY 




CHCL3 




GAN 




CHEX 




HYD 




CHLOR 




PHEN 




CLO 




QUEST 




CPHOS 




STRZ 




CYCA 




TAM 




DOX 




THEO 




EST 








ETH 








GEN 








ISON 








KETO 








MET 








NAL 








PBARB 








PEG 








PUR 








Training and Test Set 2 


Training Set 2 


Training Set 2 


Test Set 2 


Test Set 2 


Negative 


Positive 


Negative 


Positive 


5-FU 


AFLB 


AMPB 


APAP 


BAP ' 


ANIT 


AZA 


ecu 


BUS 


BRB 


BEN 


TET 


CAR 


DMN 


CAD 




CHCL3 


LPS 


CHLOR 




CHEX 




CLO 
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CIS 








CLOZ 








CMC 




J3o 1 




CPHOS 




tJTU 
J3 J.X1 




CYCA 




JSJti 1 VJ 




DIF 




lVJLCf 1 




ERY 




XT AT 




/—i A XT 

GAN 








GEN 








HYD 








ISON 








PBARB 
















PUR 








QUEST 








STRZ 








TAM 








THEO 









Training and Test Set 3 



Training Set 3 
Negative 


Training Set 3 
Positive 


Test Set 3 
Negative 


Test Set 3 
Positive 


AMPB 


APAP 


5-FU 


AFLB 


BAP 


BRB 


AZA 


ANIT 


BEN 


CCL4 


CHEX 


LPS 


BUS 


DMN 


CLO 




CAD 


TET 


CMC 




CAR 




CPHOS 




CHCL3 




CYCA 




CHLOR 




DIF 




CIS 




EST 




CLOZ 




HYD 




DEX 




MET 




DOX 




PEG 




ERY 




PUR 




ETH 




THEO 




GAN 








GEN 








ISON 








KETO 








NAL 








PBARB 








PHEN 
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OUIN 








STRZ 








TAM 1 









Training and Test Set 4 



1 raining bet 4 
Negative 


i raining oci *r 
Jrosiuvc 


X CoC Owl *T 
I'lvgilVlVw 


Test Set 4 
Positive 


5-rU 


AXTTT 


A MP J} 

/VlYXx XJ 


AFLB 




APAP 


BUS 


BRB 


n at) 


PPT A 


CAD 


TET 


jJJtiJN 


DA/TNT 


CIS 




CAR 


T PQ 


PT O 




HTTPT 1 




PMC 




CixbA 




FTH 

U/ X XX 




CxiLUK 




jyxxi x 




CLUZr 




MAT 

iN/vXv 




CPHOS 




PttAPR 




CYCA 




PFO 




DEX 




PfflPN 




DIF 




PUR 




DOX 




THEO 




ERY 








EST 








GAN 








GEN 








HYD 








ISON 








KETO 








OUIN 








STRZ 








TAM 









Training and Test Set 5 



Training Set 5 
Negative 


Training Set 5 
Positive 


Test Set 5 
Negative 


Test Set 5 
Positive 


AZA 


APAP 


5-FU 


AFLB 


BAP 


BRB 


AMPB 


ANIT 


BUS 


CCL4 


BEN 


LPS 


CHEX 


DMN 


CAD 




CIS 


TET 


CAR 




CLO 




CHCL3 




CLOZ 




CHLOR 
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CMC 




CPHOS 




DEX 




CYCA 




DIF 




MET 




DOX 




PBARB 




ERY 




PEG 




EST 




PHEN 




ETH 




PUR 




GAN 








GEN 








HYD 








ISON 
















NAL 








QUIN 








STRZ 








TAM 








THEO 









* For abbreviations please see Table 1 (Compound, Dose, Abbreviation, etc.) 
** Negative= Compounds that did not elicit histopathology (score=l) 

Positive= Compounds that did elicit histopathology (score of 2 or greater) 
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Table 24 List of Genes, Whose Expression at 72 h Directly Correlates with Liver Necrosis at 
72h, Ranked by Pearson Correlation Coefficient 



Gene 


Correlation 
Coefficient 

0.7351 


Osteoactivin 


Calpactin I heavy chain 


0.6821 


IgE binding protein 


0.6393 


Stathmin 


0.6238 


Melanoma-associated antigen ME491 


0.6196 


Phase- 1 RCT-68 


0.6127 


High affinity IgE receptor gamma chain 
(FcERIgamma) 


0 5971 


Phase-1 RCT-121 


0.5840 


Phase-1 RCT-179 


0.5815 


Gamma-actin, cytoplasmic 


0.5770 


Phase-1 RCT-154 


0.5761 


Thymosin beta- 10 


0.5760 


Alpha-tubulin 


0.5706 


14-3-3 zeta 


0.5688 


Voltage-dependent anion channel (Vdac2) 


0.5651 


Phase-1 RCT-192" 


0.5593 


Phase-1 RCT-138 


0.5574 


Uncoupling protein 2 


0.5476 


Phase-1 RCT-24 


0.5383 


Beta-actin 


0.5285 


60S ribosomal protein L6 


0.5232 


Phase-1 RCT-146 


0.5016 


Collagen type II 


0.4978 


Cofilin 


0.4868 


Beta-tubulin, class I 


0.4827 


Pyruvate kinase, muscle 


0.4816 


Calpain 2 


0.4808 


Annexin V 


0.4786 


Phase-1 RCT-144 


0.4773 


Phase-1 RCT-207 


0.4762 


Organic cation transporter 3 


0.4760 


Phase-1 RCT-12 


0.4744 


Tissue inhibitor of metalloproteinases-1 


0.4729 
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Beta-actin, sequence 2 


0.4674 


Phase-1 RCT-293 


0.4623 


Cyclin G 


0.4586 


Cathepsin S 


0.4472 


Multidrug resistant protein-2 


0.4446 


Phase-1 RCT-211 


0.4420 


Multidrug resistant protein-1 


0.4402 


Cyclin Dl 


0.4382 


Nucleoside diphosphate kinase beta isoform 


0.4331 


Biliverdin reductase 


0.4310 


60S ribosomal protein L6 (alternate clone 1) 


0.4308 


Phase-1 RCT-215 


0.4231 


Cathepsin B 


0.4180 


Phase-1 RCT-37 


0.4077 


Ribosomal protein S8 


0.4072 


Ribosomal protein S9 


0.4040 


Heme oxygenase 


0.4033 


CD44 metastasis suppressor gene 


0.4021 
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Table 25 List of Genes, Whose Expression at 72 h Inversely Correlates with Liver Necrosis at 
72h, Ranked by Spearman Correlation Coefficient 



Gene 


Correlation 


^oexncicni 


DVioo^ 1 DPT 197 

rnase-i kui-izo 




ir nase-i kujl-jlod 


A 9^1 6 


Cholesterol esterase 


A 9*\1 8 


u-reactive protein 


-0 951 R 


T)L QCIO i npT 0£A 

rnase-i kui-zou 




Retinol dehydrogenase type in 


A 954^ 


rnase-1 KUl-o/ 


A 95^1 


Aquaponn-j (/u^roj 


A 956S 


JNAJJxl-cytocnrome dd reductase 


-A 9585 


rnase-1 KLl-z/o 


A O^AzL 


Interferon inducible proteui 10 


A 966? 


Acetyicnonne receptor epsiion 


-A 9675 




-A 9676 


DV»ooo 1 DPT 01Q 

rnase-1 KCL-ziy 


-A 968^ 


DVtae^ 1 DPT 7^ 

r nase- 1 s\\^ 1-/0 




Phase-l RCT-29 


-0,2707 


Gap junction membrane channel protein beta 1 (Gjbl) 


-0.2731 


Phase-l RCT-285 


-0.2735 


Phase-l RCT-38 


-0.2746 


Cytochrome P450 2D 18 


-0.2768 


Phase-l RCT-227 


-QUIA 


Matrin F/G 


-0.2781 


Phase-l RCT-33 


-0.2809 


Phase-l RCT-280 


A OQ 1 O 


Equilbrative nitrobenzylthioinosine-sensitive nucleoside 
transporter 


-0.2827 


L-gulono-gamma-lactone oxidase 


-0.2837 


Aryl sulfotransferase 


-0.2838 


alpha- 1 ,2-fucosyltransferase 


-0.2848 


Phase-l RCT-98 


-0.2853 


Urinary protein 2 precursor 


-0.2874 


Tyrosine hydroxylase 


-0.2897 


Cytochrome P450 3A1 


-0.2910 
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NLPK 


-0.2926 


Protein tyrosine phosphatase, receptor type, D 


-0.2952 


Contrapsin-like protease inhibitor (CPi-21) 


-0.2961 


Phase-1 RCT-187 


-0.2963 


Connexin-32 


-0.2995 


Phase-1 RCT-81 


-0.2999 


Phase-1 RCT-256 


-0.3038 


Cytochrome P450 2A3 


-0.3078 


Insulin-like growth factor I 


-0.3079 


Apolipoprotein CHI 


-0.3097 


Phase-1 RCT-292 


-0.3099 


Phase-1 RCT-178 


-0.3122 


Phase-1 RCT-102 


-0.3187 


Arginosuccinate synthetase 1 


-0.3193 


Fatty acid synthase 


-0.3234 


Aldehyde dehydrogenase 2 


-0.3355 


N-hydroxy-2-acetylanunofluorene sulfotransferase (ST1C1) 


-0.3355 


Phase-1 RCT-48 


-0.3428 


Phase-1 RCT-149 


-0.3456 


Phase-1 RCT-117 


-0.3466 


JNK1 stress activated protein kinase 


-0.3517 


Phase-1 RCT-36 


-0.3552 


Phase-1 RCT-78 


-0.3568 


Phase-1 RCT-164 


-0.3596 


Stearyl-CoA desaturase, liver 


-0.3666 


Glycine methyltransferase 


-0.3758 


Dynarnin-1 (D100) 


-0.3774 


Betaine homocysteine methyltransferase (BHMT) 


-0.3779 


Phase-1 RCT-107 


-0.3869 


Cytochrome P450 2C11 


-0.3876 


Phase-1 RCT-290 


-0.4002 


Apolipoprotein AH 


-0.4022 


Insulin-like growth factor I, exon 6 


-0.4110 


Alpha-2-microglobulin 


-0.4294 
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Table 26 List of genes whose expression at 72 hours is predictive of liver toxicity at 72 hours 



vjcne iNaiuc 


Combination 
Category 


Calpactin I heavy chain 


5 


Osteoactivin 


5 


60S ribosomal protein L6 


4 


Collagen type II 


4 


Gamma-actin, cytoplasmic 


4 


Glycine methyltransferase 


4 


High affinity IgE receptor gamma chain (FcERIgamma) 


4 


IgE binding protein 


■4 


Phase-1 RCT-179 


4 


Phase-1 RCT-192 


4 


Stathmin 


4 


Thymosin beta-10 


4 


Uncoupling protein 2 


4 


Alpha-2-microglobulin 


3 


Alpha-tubulin 


3 


Biliverdin reductase 


3 


Cofllin 


3 


Heme oxygenase 


3 


Melanoma-associated antigen ME491 


3 


Multidrug resistant protein-2 


3 


Phase-1 RCT-121 


3 


Phase-1 RCT-138 


3 


Phase-1 RCT-146 


3 


Voltage-dependent anion channel (Vdac2) 


3 


Phase-1 RCT-39 


3 


Phase-1 RCT-68 


3 


Ribosomal protein S9 


3 


14-3-3 zeta 


2 


Adenine nucleotide translocator 1 


2 


Alpha-2-macroglobulin, sequence 2 


2 


Annexin V 


2 


Beta-actin 


2 


Beta-actin, sequence 2 


2 


Beta-tubulin, class I 


2 


Calpain 2 


2 
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Cvclin Dl 

vyvlUl X*' X 


2 


Cvstatin C 


2 


rvtochrome P450 2C1 1 

V** V wXXXVJXXlw X * *J \J \mS A X 


2 


n-lntathione Sl-transferase theta-1 

VJlUlAlilXv/XXw k_) UL CUIljAwI UOw Uiuvu A 


2 


TtiQiilin-like Growth factor L exon 6 

XXAu U1XXX XXXV w j^iv >> lu asawv/a J.) w/*vii v 


2 


A/fnltinrutx resistant nrotein-1 

IVXU-lLilxX IXjl lvOloLULXl jjiutum x 


2 


"MiTol^nciH^ Hinhncnhntp Vina^ft hfita isnfonTl 

XN UUxCUoXUC UlL/XIUoUxxCllw AXiiaow uvia xdvxvxaxa 


2 


OrCTnnif pntirvn tran<*norter 3 


2 


Phase- 1 RCT-107 


2 


Phase-1 RCT-12 


2 


Phase- 1 RCT-144 

X 11 dot' 1 J\.V.y X "* 1 *T*T 


2 


Phase-1 RCT-154 

XXXcIO^ X X\.V_/ X X 


2 


Phase- 1 RCT-207 


2 


Phase- 1 RCT-211 

X Uaow X XvV_/ X ±* X X 


2 


Phase- 1 RCT-215 


2 


Phase-1 RCT-24 

X XXClDw X 1VV X A* « 


2 


Phase- 1 RCT-78 

X HUuV X 1\V X / w 


2 


Phase- 1 RCT-81 

XT XIClOw X 1W X \J X 


2 


AHQ riV*ncrimfl1 nmtpin T.6 ^alternate clone 1^ 

OuD llUU&UlllcU IJUJlCixx* *-/\J ^ciitwxxxcxvv wiviiv xj 




.rVLU.cix_yu.c ^J-Ciijf uiygcuoov ^ 




A1nhn-1 mirmfflohiilin/hikuniTi nreciirsor fAlxxbO^ 

X\x\JklA 1 1111 wl WfjXVJUUlXXXX/ L/AAVUIXXXI yiVWWHOWi \* »■' llt-/ Jr/ 




Alnha-nrothvmosin 




Anolinonrotein ATT 


! 


Apolipoprotein CI 


! 


Apolipoprotein CHI 


1 


Arffinosuccinate synthetase 1 


I 


T Trrnarv nrotein 2 nrecursor 

V^XXXXOXjr JJA W twAXJ. a* ^yiwuxuvii 


I 


Ppf-flinp hnTnnrvsteinp methvltraiisferase fiBHMX) 

XJSslCllllC XlVlllUv_yoivXXlV lliwUJjf lUUllJlviuuv yvxuuA j 


I 


Path en sin B 

V^ClLXXvL/OXAl 


l 


C^fl then sin S 


1 


Cholpsterol esterase 


j 


Conn exin- 3 2 


! 


Pontransin-like nrotease inhibitor fCPi-21) 




C-reactive protein 





Pvclin G 




Cytochrome P450 2C23 




Cytochrome P450 2D18 




Dynamin-1 (D100) 




Equilbrative nitrobenzylthioinosine-sensitive nucleoside 
transporter 
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Fatty acid synthase 


I 


Gap junction membrane channel protein beta 1 (Gjbl) 


I 


Hypoxanthine-guanine phosphoribosyltransferase 




Insulin-like growth factor I 




Interleukin-18 




JNK1 stress activated protein kinase 


l 


Lecithinxholesterol acyltransferase 


1 


L-gulono-gamma-lactone oxidase 




Matrin F/G 




NADH-cytochrome b5 reductase 




N-hydioxy-2-acetylaminofluorene sulfotransferase (ST1C1) 


I 


p53 




p55CDC 


! 


Phase-1 RCT-102 


1 


Phase-1 RCT-109 


I 


Phase-1 RCT-145 


I 


Phase-1 RCT-149 


I 


Phase-1 RCT-164 


I 


Phase-1 RCT-173 


I 


Phase-1 RCT-185 


I 


Phase-1 RCT-187 


I 


Phase-1 RCT-219 


I 


Phase-1 RCT-227 


I 


Phase-1 RCT-230 


I 


Phase-1 RCT-256 


I 


Phase-1 RCT-278 


I 


Phase-1 RCT-285 


I 


Phase-1 RCT-290 


1 


Phase-1 RCT-292 


I 


Phase-1 RCT-293 


I 


Phase-1 RCT-33 


I 


Phase-1 RCT-36 


I 


Phase-1 RCT-37 


I 


Phase-1 RCT-38 


1 


Phase-1 RCT-48 




Phase-1 RCT-58 




Phase-1 RCT-61 




Proliferating cell nuclear antigen gene 




Protein tyrosine phosphatase, receptor type, D 
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PTEN/MMAC1 


1 


Pyruvate kinase, muscle 


1 


Retinol dehydrogenase type in 


1 


Ribosomal protein S8 




Steaxyl-CoA desaturase, liver 




Thymidylate synthase 




Ubiquitin conjugating enzyme (RAD 6 homologue) 




Zinc finger protein 





* Combination category is the number of training/test set gene list occurrences. 
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Table 27 Liver Toxicity Compound-Dose Prediction Values 
for 72 Hour Data Predictive Genes (Combined List and Subsets) 



Gene List 
Combo All 




Prediction Measure* ** 


Accuracy 


False Positive 
Rate 


False Negative 
Rate 


Geometric 
Mean 2 


Mean 


0.790 


0.192 


0.342 


0.729 




Minimum 


0.690 


0.134 


0.250 


0.642 




Maximum 






0.417 


0.775 














Combo 5 


Mean 


0.641 


0.351 


0.417 


0.615 




Minimum 


0.523 


0.209 


0.333 


0.513 




Maximum 


0 772 


0.474 


0.500 


0.726 














Combo 4 


Mean 


0.749 


0.226 


0.417 


0.664 




Minimum 


0.652 


0.147 


0.333 


0.533 




Maximum 


0.823 


0.350 


0.667 


0.753 














Combo 3 


Mean 


0.715 


0.269 


0.400 


0.660 




Minimum 


0.699 


0.244 


0.333 


0.558 




Maximum 


0.747 


0.293 


0.583 


0.710 














Combo 2 


Mean 


0.713 


0.261 


0.500 


0.602 




Minimum 


0.644 


0.192 


0.333 


0.524 




Maximum 


0.767 


0.320 


0.625 


0.710 














Combo 1 


Mean 


0.570 


0.449 


0.275 


0.631 




Minimum 


0.529 


0.403 


0.125 


0.551 




Maximum 


0.620 


0.480 


0.417 


0.692 



* Prediction measures are given as means and range of values for five training/test 
sets using 72 hour array data and gene lists as presented in Example 5. Unit of prediction was 
the animal and the predictive classification was for liver necrosis observed at 72 hours after 
treatment. 

** Standard prediction measures were used as defined in Materials and Methods. As 
described in Materials and Methods in these analyses cases where no prediction was made 
because the p-value ratio exceeded the cutoff-value (generally 0.5) the non-call was 
considered to be incorrect. 
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Table 28 Comparison of Predictivity for Correct Liver Toxicity 
Classification and Random Classification Using Combo Gene Sets 72 h data 



Gene List 




Accuracy * ** 


Classification 


Random 
Classification 


Combo All 


Mean 


ft 700 






Minimum 


fi AQfi 


fi fi^ 




Maximum 


fi 


fi 31 £ 










Combo 5 


Mean 


0 641 

VFtUt J. 


0 261 




Minimum 


fi *s93 


fi 19A 




Maximum 


fi 779 


fi 41 8 










Combo 4 


Mean 


0,749 


0,257 




Minimum 




v. X ou 




Maximum 


0 89^ 












Combo 3 


Mean 


0.715 


0.281 




Minimum 


- 0?699 


0.161 




Maximum 


0.747 


0.367 










Combo 2 


Mean 


0.713 


0.211 




Minimum 


0.644 


0.115 




Maximum 


0.767 


0.304 










Combo 1 


Mean 


0.570 


0.235 




Minimum 


0.529 


0.023 




Maximum 


0.620 


0.354 



* Combo Gene Lists as in Example 1, Table 1. For Combo lists all genes were used 
for prediction. 

** Accuracy = proportion of the total number of predictions that are correct. Non- 
calls are counted as incorrect predictions. Accuracy was calculated for correct 
classifications of liver toxicity assigned to the samples and for randomized 
classifications in the same proportions as the correct classifications. Values presented 
are the mean accuracy values for 5 training/test sets with minimum and maximum 
accuracy values. 
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Table 29 Prediction of Liver Toxicity for Samples External to Database 



Predicting 
Gene Set* 




Animal 


Prediction Values** 


Prediction 


P-Value Ratio 


No Votes 


No P-Value 


Yes Votes 


Yes P Value 


Combo 6 


Cephaloridine 1500 mg/kg i.p. 24 h 


501 


yes 


0.000 


0 


1 


10 


0 


Combo 6 


Cephaloridine 1500 mg/kg i.p. 24 h 


506 


yes 


0.000 


0 


1 


10 


0 


Combo 6 


Cephaloridine 1500 mg/kg i.p. 24 h 


508 


yes 


0.000 


0 


1 


10 


0 


Combo 6 


Cisplatin 20 mg/kg i.p. 24 h 


602 


yes 


0.000 


2 


1 


8 


0 


Combo 6 


Cisplatin 20 mg/kg i.p. 24 h 


603 


yes 


0.000 


0 


1 


10 


0 


Combo 6 


Cisplatin 20 mg/kg i.p. 24 h 


604 


yes 


0.000 


0 


1 


10 


0 




















Combo 5 


Cephaloridine 1500 mg/kg i.p. 24 h 


501 


yes 


0.001 


4 


1 


6 


0,001 


Combo 5 


Cephaloridine 1500 mg/kg i.p. 24 h 


506 


yes 


0.000 


1 


1 


9 


0 


Combo 5 


Cephaloridine 1500 mg/kg i.p. 24 h 


508 


yes 


0.000 


2 


1 


8 


0 


Combo 5 


Cisplatin 20 mg/kg i.p. 24 h 


602 


yes 


0.208 


7 


0,945 


3 


0.197 


Combo 5 


Cisplatin 20 mg/kg i.p. 24 h 


603 


yes 


0.208 


7 


0.945 


3 


0.197 


Combo 5 


Cisplatin 20 mg/kg i.p. 24 h 


604 


yes 


0.001 


4 


1 


6 


0.001 




















Combo 4 


Cephaloridine 1500 mg/kg i.p. 24 h 


501 


yes 


0.000 


1 


1 


9 


0 


Combo 4 


Cephaloridine 1500 mg/kg i.p. 24 h 


506 


yes 


0.000 


2 


i 


8 


0 


Combo 4 


Cephaloridine 1500 mg/kg i.p. 24 h 


508 


yes 


0.000 


0 


1 


10 


0 


Combo 4 


Cisplatin 20 mg/kg i.p. 24 h 


602 


yes 


0.010 


5 


0.999 


5 


0.01 


Combo 4 


Cisplatin 20 mg/kg i.p. 24 h 


603 


yes 


0.000 


1 


1 


9 


0 


Combo 4 


Cisplatin 20 mg/kg i.p. 24 h 


604 


yes 


0.000 


1 


1 


9 


0 




















Combo 3 


Cephaloridine 1500 mg/kg i.p. 24 h 


501 


yes 


0.001 


4 


I 


6 


0.001 


Combo 3 


Cephaloridine 1500 mg/kg i.p. 24 h 


506 


yes 


0.208 


7 


0.945 


3 


0.197 


Combo 3 


Cephaloridine 1500 mg/kg i.p. 24 h 


508 




0.606 


8 


0.803 


2 


0.487 


Combo 3 


Cisplatin 20 mg/kg i.p. 24 h 


602 


yes 


0.208 


7 


0.945 


3 


0.197 


Combo 3 


Cisplatin 20 mg/kg i.p. 24 h 


603 


yes 


0.001 


4 


1 


6 


0.00 i 


Combo 3 


Cisplatin 20 mg/kg i.p. 24 h 


604 


yes 


0.055 


6 


0.99 


4 


0.055 




















Combo 2 


Cephaloridine 1500 mg/kg i.p. 24 h 


501 


yes 


0.000 


3 


1 


7 


0 


Combo 2 


Cephaloridine 1500 mg/kg i.p. 24 h 


506 


yes 


0.000 


3 


1 


7 


0 


Combo 2 


Cephaloridine 1500 mg/kg i.p. 24 h 


508 


yes 


0.000 


3 


1 


7 


0 


Combo 2 


Cisplatin 20 mg/kg i.p. 24 h 


602 


yes 


0.010 


5 


0.999 


5 


0.01 


Combo 2 


Cisplatin 20 mg/kg i.p. 24 h 


603 


yes 


0.000 


3 


1 


7 


0 


Combo 2 


Cisplatin 20 mg/kg i.p. 24 h 


604 


yes 


0.000 


2 


I 


8 


0 




















Combo 1 


Cephaloridine 1500 mg/kg i.p. 24 h 


501 


yes 


0.000 


1 


1 


9 


0 


Combo 1 


Cephaloridine 1500 mg/kg i.p. 24 h 


506 


yes 


0.000 


1 


1 


9 


0 


Combo 1 


Cephaloridine 1500 mg/kg i.p. 24 h 


508 


yes 


0.000 


3 


1 


7 


0 


Combo 1 


Cisplatin 20 mg/kg i.p. 24 h 


602 


yes 


0.001 


4 


1 


6 


0.001 


Combo 1 


Cisplatin 20 mg/kg i.p. 24 h 


603 


yes 


o.ooo 


3 


1 


7 


0 


Combo 1 


Cisplatin 20 mg/kg i.p. 24 h 


604 


yes 


0.000 


3 


1 


7 


0 
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* All genes used for Combo Gene Lists as in Example 1, Table 1. 

** Prediction values are output from prediction program. Values include prediction 
(yes=liver toxicity predicted, no=no liver toxicity predicted), numbers of yes and no votes 
from 10 nearest neighbors, the p-value for the no and yes votes and the p-value ratio for the 
predicted class over the not predicted class. A p-value ratio cutoff of 0.5 was used 



Page 58 of 58 



WO 03/085083 PCT/US03/10141 



Table 30 K-means Cluster Analysis of Combo 5, 4, 3 and 2 Gene Set 



Cluster 1 


Cluster 2 


Cluster 3 


vjcuiiJ.iJia cuaj-Uj vjr ivy laoixu u 


Senescence marker protein-30 


RCT-78 


ntntpin 1 

[JlUvviil i 


RCT-33 


Iv-fful ono- ffam ma-lactone 

-iW^f* UlviiV/ tilJl A 1 lilt* At* V (rVAAv 

oxidase 


RCT-68 


RCT-36 


RCT-256 


RCT-39 


RCT-139 


RCT-38 


Tntetrrin hetal 




RCT-296 


Zinc fin&er nrotein 




RCT-92 


RCT-50 




Dynamin-1 (D100) 


c-myc 




RCT-128 


RCT-144 




RCT-264 


PAR interacting protein 




. RCT-89 


RCT-145 




RCT-270 


RCT-49 




RCT-182 


RCT-213 




RCT-291 


RCT-258 




Hepatic lipase 


RCT-241 




RCT-271 


Gadd45 




Matrin F/G 


Heme oxygenase 




RCT-288 


14-3-3 zeta 




RCT-189 


Beta-actin 






Ornithine decarboxylase 






RCT-207 






Bax (alpha) 






ID-1 


- 




RCT-180 






RCT-191 












Cluster 4 


Clusters 


Cluster 6 


N-hydroxy-2-acetylaminofluorene 
sulfotransferase (ST1C1) 


Phase- 1 RCT-40 


Alpha 1 - inhibitor 111 


Equilbrative nitrobenzylthioinosine- 
sensitive nucleoside transporter 




RCT-48 


Paraoxonase 1 




RCT-102 


lsulin-like growth factor binding protein 3 




RCT-1 17 


RCT-83 






Epidermal growth factor 












Cluster 7 


Cluster 8 




Cathepsin L, sequence 2 


Carbonic anhydrase IH 




RCT-179 


Organic anion transporter 3 




Ribosomal protein S17 


RCT-123 
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60S ribosomal protein L6 






Voltage-dependent anion channel 2 (Vdac2) 
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Table 31 RCT genes (ESTs) Predictive for Liver Necrosis at 72 hours: 
Best Homology Matches 



Gene Name 


Homology 


RCT-1 02 


Mouse pentylenetetrazol-related mRNA PTZ-17 (3'UTR of E3.1) 


RCT-1 07 


no significant homology found 


RCT-109 


Rattus norvegicus nesprin-1 mRNA 


RCT-110 


Homo sapiens, clone IMAGE:3677434, mRNA 


RCT-117 


no significant homology found 


RCT-1 2 


no sianificant homoloav found 


RCT-1 21 


no significant homology found 




no <sinnif ipant hrtmnlonw ff\i inH 
iiu oiyi iuiucii n iiuiiiuiuyy iuuiiu 


RCT-127 


no significant homology found 


RCT-1 28 


Mus muscuius angiopoietin-related protein 3 (Angptl3) 


KG 1 ~lof 


Mus muscuius adult male tongue cDNA 


KG 1 -loo 


Mus muscuius DAP10 (Dap10) gene 


KG 1 -1 ob 


no significant homology found 


RCT-1 44 


Mus muscuius, similar to nucleolar protein (KKE/D repeat), clone 
iMA\V3t.o4yi44o, itikina, partial cos. 


RCT-145 


Mus muscuius 10 day old male pancreas cDNA, RIKEN full-length enriched 

liKrarx/ r % \c\r\&*'\fK^{\(\'\A.VX'\ Q full incarf com ican^o 
ii ui di y, isiui io, I o i uu itu i\7, lull u lot?) l oc?L|Uoi lots 


RCT-1 46 


Mus muscuius 8 days embryo cDNA, RIKEN full-length enriched library, 
clone:5730458E20 


RCT-149 


Mouse mRNA fragment for serum amyloid A (SAA) 3 protein 


HAT ^ r- 

RCT-1 5 


Mus muscuius ubiquitin conjugating enzyme 7 mRNA, complete cds 


RCT-1 52 


Mus muscuius, eukaryotic translation elongation factor 1 beta 2, clone 
MGC;o763 lMAGt:3o00o50 t mRNA, complete cds. 


RCT-1 54 


Mus muscuius vacuolar ATPase subunit D (Atp6m) mRNA, complete cds 


RCT-1 62 


Mus muscuius, clone IMAGE:3501507 


RCT-1 64 


Mus muscuius adult male testis cDNA, RIKEN full-length enriched library, 
clone:4932443D16 


RCT-1 68 


M. muscuius mRNA for low density lipoprotein receptor, ACCESSION X64414 

OO 1 OvJU 


RCT-1 69 


Mus muscuius, small inducible cytokine B subfamily (Cys-X-Cys), member 9, 
clone MGC'6179 IMAGE-3257716 mRNA eomnletp 


RCT-1 73 


Mus muscuius NADP+-SDecific isocitrate dehvdroaenase mRNA comolete 
cds; nuclear gene for mitochondrial product 


RCT-1 77 


Mus muscuius, Similar to peroxisomal delta3, delta2-enoyl-Coenzyme A 
isomerase, clone MGC:5644 IMAGE:3591615 


RCT-1 79 


Rat nucleolar protein B23.2 mRNA 


RCT-1 8 


no significant homology found 


RCT-1 80 


Mus muscuius B-celi receptor-associated protein 37 (Bcap37 


RCT-1 81 


Mus muscuius adult male testis cDNA 


RCT-1 82 


Rattus norvegicus gib mRNA for diacetyl/L-xylulose reductase 


RCT-1 85 


no significant homology found 
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RCT-187 


Mus musculus 1 1 days pregnant adult female ovary and uterus cDNA, RIKEN 
full-length enriched library, clone:5033416F05 t full insert sequence 


RCT-189 


Rattus norvegicus eukaryotlc translation initiation factor 4E (Eif4e), mRNA 


RCT-191 


Mus musculus, Similar to proteasome (prosome, macropain) 26S subunit, 
non-ATPase, o, clone mcjC.o405 imaoE:oooo4^/\ itikna, complete cds 


RCT-192 


Mus musculus 18 days embryo cDNA, RIKEN full-length enriched library, 
clone:1110033J19 

. . — . . — — . . 


RCT-197 


Rattus norvegicus Protein kinase, interferon-inducible double stranded RNA 

Ut3fJt?l IUgIII ^rlfti/, ITlrxIN/A 


RCT-20 


Mus musculus cysteine and histidine-rich domain (CHORD)-containing, 


DPT OC\A V ' 


Mouse DNA sequence from clone RP23-138F20 on chromosome 13, 
compiexe sequence [mus muscuiusj 




no sinnificant homolonv found 




Mus musculus Ran bindina nrotein 5 mRNA nartia! cds 


RCT-211 


Mus musmliis arlnlt malp kirlnpv rHNA RIVCFN fulLlpnath pnrirheri lihrarv 
clone:0610009C22 




Hnmn sanipns nM*i nmfpin fPM5^ mRNA 




Mus miisculus mifath/A NIAI"VP^H stpmiri rlphvrirnn&nzi^f* mRNA 


RCT-215 


Mus musculus RAB/Rip protein mRNA 


OPT OHO 

KG I -219 


Katius norvegicus oiigoaaenyiate symnetase-^ itikina, compieie cas 


RCT-221 


no significant homology found 


RCT-225 


Rattus norvegicus chromosome 4 clone RP31-327J16 strain Brown Norway, 
compieu? ocqucMot? 


OPT 007 


no signiTicani nomoiogy iounu 


RCT-230 


ivius muscuius vjU"-uissociaiion mniDiior rnrsiNM, preiereniiaiiy expresseu in 
hematopoietic cells, complete cds 


RCT-239 


Mus miisriilns aHnlt malp tnnnup rHNA RIKFN full-lpnnth pnrirhpri lihrarv 

clone:2300007B01 , full insert sequence 


RCT-24 


Mus musculus, tubulin alpha 8, clone MGC/.28850 IMAGE:4507364, mRNA, 


DPT O A 4 

RC 1-241 


mus muscuius oncosiann receptor ^usmr;, itikinh 


DPT 0/4 O 

KG 1-242 


r\aiius norv&yicus &-C81) uansiocauon gene anu-projirerauve^Dig^y, 


DPT O/IO 

KG r-24o 


ivius muscuius ciauoin to ^oion 10-penaing;, mKiNH 


RCT-252 


Mus musculus EH-domain containing 3 (Ehd3), 


DPT ORfi 
KL» 1 -ZOO 


Mus musculus, Similar to betaine-homocysteine methyltransferase 2, clone 


RCT-258 


Mus musculus, clone MGC:6139 IMAGE:3487295 f mRNA 


RCT-264 


Mus musculus sodium-sulfate cotransporter (Nasi) gene 


RCT-270 


Mus musculus, RIKEN cDNA 2010011 120 gene, done MGC27703, 
IMAGE:4924329 ( mRNA, complete cds 


RCT-271 


Homlogous to Mus musculus, clone MGC:27581 !MAGE;4489072, mRNA 


RCT-277 


no significant homology found 


RCT-278 


Mus musculus brain protein 17 (Brp17), mRNA 
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RCT-285 


Mus musculus, Similar to single Ig IL-1R-related protein, clone MGC:18899 
IMAGE:4240425, mRNA, complete cds 


RCT-287 


Mus musculus adult male kidnev cDNA clone*061 0010190 


RCT-288 


no significant homology found 


RCT-289 


Mus musculus adult male liver cDNA, RIKEN full-length enriched library, 
clone: 1300003K24, full insert sequence 


RCT-290 


Homo saoiens chromosome 14 clone RAH 901 F1 man iAnOA ^nmnloto 
sequence 


RCT-291 


no significant homology found 


RCT-292 


Rattus norvegicus 2'5' oligoadenylate synthetase-2 


RCT-293 


Mus musculus 18 days embryo cDNA, RIKEN full-length enriched library 
clone;1110021C22- 


RCT-296 


Mus musculus corticosteroid binding globulin (Cbg) 


RCT-33 


no significant homology found 


RCT-34 


no significant homology found 


RCT-36 


no significant homology found 


RCT-37 


no significant homology found 


RCT-38 


Mus musculus betaine-homocysteine methyitransferase 2 (Bhmt2) mRNA, 


RCT-39 


no significant homology found 


RCT-40 


Rattus norvegicus Cathepsin C (dipeptidyl peptidase i) (Ctsc) 


RCT-48 


Mus musculus adult male liver cDNA, RIKEN full-length enriched library, 
clone: 1300003K24, full insert sequence 


RCT-49 


No match with score above 200 


RCT-50 


Mus musculus fibroblast growth factor regulated protein 2 


RCT-55 


M.muscufus myoglobin gene exons 2-3 


RCT-58 


Rat mRNA for delta-4-3-ketosteroid 5-beta-reductase, complete cds 


RCT-59 


no significant homology found 


RCT-61 


no significant homology found 


RCT-65 


no significant homology found 


RCT-66 


M.musculus mRNA for low density lipoprotein receptor 


RCT-68 


Rattus norvegicus nucleosome assembly protein mRNA 


RCT-70 


Mus musculus adult male testis cDNA, RIKEN full-length enriched library, 
clone:4933406P04, full insert sequence 


RCT-71 


Mus musculus, clone MGC:11987 IMAGE:3601737, mRNA 


RCT-72 


no significant homology found 


RCT-73 


no significant homology found 


RCT-75 


Mus musculus adult male liver cDNA, RIKEN full-length enriched library, 
clone: 1 300002K09, full insert sequence 


RCT-78 


Mus musculus adult male lung cDNA, RIKEN full-length enriched library, 
clone:1 20001 5G06, full insert sequence 


RCT-81 


no significant homology found 


RCT-82 


Mus musculus nucleosome binding protein 1 (Nsbpl), 


RCT-83 


no significant homology found 


RCT-87 


Mus musculus adult male tongue cDNA 


RCT-88 


no significant homology found 


RCT-89 


no significant homology found 


RCT-92 


no significant homology found 
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Table 33 Liver Hepatocellular Necrosis Predictive Genes 
Whose Protein Products Are Known to be Secreted 

Apollpoprotein All 

Apollpoprotein C1 

Apollpoprotein CHI 

C4b-binding protein 

C-reactive protein 

Cystatin C 

Epidermal growth factor 

Ferritin H-chain 

Insulin-like growth factor I 

Insulin-like growth factor I, exon 6 

lnterleukin-18 

Lecithinxholesterol acyltransferase 

Macrophage inflammatory proteln-1 alpha 

Macrophage inflammatory protein-2 alpha 

Matrix metalloproteinase-1 

NGF-inducible antiproliferative putative secreted protein (PC3) 

Selenoprotein P 

T-cell cyclophilin 

Transthyretin 



Table 37 Predictive Performance of Predictive Genes Organized by Occurrence on 
Training/Test Set Lists (Combo number) and Time Point 



Time Point 


Gene Set 


Number 
of Genes 


Accuracy** 


Geometric Mean** 


24 h 


Combo All 


142 


0.924 (0.872 - 0.960) 


0.917 (0.772 - 0.956) 


24 h 


Combo 5 


3 


0,896(0.837-0.961) 


0.901 (0.868-0.941) 


24 h 


Combo 4 


7 


0.857 (0.796-0.913) 


0.862 (0.800 - 0.915) 


24 h 


Combo 3 


22 


0.865 (0.755 - 0.923) 


0.900 (0.854-0.954) 


24 h 


Combo 2 


36 


0.912(0.851 -0.950) 


0.904 (0.762 - 0.955) 


24 h 


Combo 1 


74 


0.894 (0.853 - 0.941) 


0.844 (0.705 - 0.949) 












6h 


Combo All 


98 


0.712 (0.610-0.833) 


0.669 (0.474-0.804) 


6h 


Combo 5 


10 


0.684 (0.597-0.756) 


0.663 (0.451-0.794) 


6h 


Combo 4 


7 


0.667(0.623-0.756) 


0.626 (0.545-0.754) 


6h 


Combo 3 


15 


0.646 (0.534-0.704) 


0.648 (0.598-0.722) 


6h 


Combo 2 


24 


0.684 (0.571-0.833) 


0.613 (0.474-0.744) 


6h 


Combo 1 


42 


0.618 (0.494-0.846) 


0,526 (0.385-0.617) 












72 h 


Combo All 


130 


0.790(0.690-0.835) 


0.729 (0.642 - 0.775) 


72 h 


Combo 5 


1 


0.641 (0.523 - 0.772) 


0.615(0.513-0.726) 


72 h 


Combo 4 


17 


0.749 (0.652 - 0.823) 


0.664 (0.533 - 0.753) 
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72 h 


Combo 3 


21 


0.715 (0.699-0.747) 


0.660 (0.558-0.710) 


72 h 


Combo 2 


23 


0.713(0.644 - 0.767) 


0.602 (0.524-0.710) 


72 h 


Combo 1 


68 


0.570 (0.529 - 0.620) 


0.631 (0.551 -0.692) 



Table 38: 266 Liver Toxicity Predictive Genes Organized by Time Point and Combo Class 





6h 24h 72h 


fRibosomal Drotsin L61 


Not Found Combo 1 Combo 1 


14-3-3 zeta 


Combo 2 Combo 3 Combo 2 


17-beta hydroxysteroid dehydrogenase, type 2 


Combo 1 Combo 2 Not Found 


25-DX 


Not Found Combo 1 Not Found 


3-beta-hydroxysteroid dehydrogenase (HSD3B1) 


Combo 1 Not Found Not Found 


3-hydroxyisobutyrate dehydrogenase 


Not Found Combo 2 Not Found 


60S rihosoma! Drotein L6 


Not Found Combo 2 Combo 4 


8-nxaauanlne DNA oivcosvlase 


Combo 1 Not Found Not Found 


Arptvl-CnA rarboxvlase 


Combo 2 Not Found Not Found 


Antiwatlnn francr*rintinn factor *^ 


Combo 2 Not Found Not Found 


MUclllilo nUuloUUUC U allolUV/CUUl I 


Not Found Not Found Combo 2 


Aflatoxin B1 aldehvde reductase 


Not Found Combo 1 Not Found 


Alcohol dehvdroaenase 1 


Combo 1 Not Found Not Found 


Aldehvde dehvdroaenase 2 


Not Found Not Found Combo 1 


Aldehyde dehydrogenase, microsomal 


Not Found Combo 1 Not Found 


Alpha 1 - inhibitor 111 


Not Found Combo 2 Not Found 


Alnhn-1 mirrnnlnhiilin/hikunin nrpnursor fAmbn^ 


Not Found Not Found Combo 1 


Atnhft-9-insir'mnlnhi liin 

Ml [J I la"*- 1 1 laVtl IUUUMI 1 


Not Found Combo 1 Not Found 


AInha-9-ms»^mrrlrthi ilin ^^nui^nfif* 2 
Mipna-^ -f i lavi wyiuuuiii i, oci^uci ioo £. 


Combo 4 Not Found Combo 2 


AlDha-2-microalobulin 


Not Found Not Found Combo 3 


Aloha-Drothvmosin 


Not Found Not Found Combo 1 


Alnha-tuhulin 

mui igi iuuuiii i 


Not Found Not Found Combo 3 


Anne Yin V 

ni ii ivaii i v 


Not Found Not Found Combo 2 


Apoiipoprotein All 


Not Found Not Found Combo 1 


Apolipoproteln C1 


Not Found Not Found Combo 1 


AnnlinoDrotein Clll 


Not Found Combo 1 Combo 1 


Argininosuccinate lyase 


Combo 5 Combo 1 Not Found 


Arginosuccinate synthetase 1 


Not Found Not Found Combo 1 


ATPase inhibitor (rat mitochondria! IF1 protein) 


Not Found Combo 1 Combo 1 


Bax (alpha) 


Not Found Combo 2 Not Found 


Beta-actin 


Not Found Combo 2 Combo 2 


Beta-actin, sequence 2 


Not Found Not Found Combo 2 


Betaine homocysteine methyltransferase (BHMT) 


Not Found Not Found Combo 1 


Beta-tubulin, class i 


Not Found Combo 1 Combo 2 


Biliverdin reductase 


Not Found Not Found Combo 3 


C4b-binding protein 


Combo 2 Not Found Not Found 


Calpactin I heavy chain 


Not Found Combo 1 Combo 5 


Calpain 2 


Not Found Not Found Combo 2 


Carbamyl phosphate synthetase I 


Not Found Combo 1 Not Found 
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Carbonic anhydrase HI 

Carbonyl reductase 

Carnitine palmitoyl-CoA transferase 

Caspase 6 

Cathepsin B 

Cathepsin L, sequence 2 
Cathepsin S 

Cholesterol 7-alpha-hydroxylase (P450 VII) 

Cholesterol esterase 

Choline kinase 

c-H-ras 

c-jun 

c-myc 

Cofilin 

Collagen type II 
Connexin-32 

Contrapsin-like protease inhibitor (CPi-21) 
C-reactive protein 
Cyclin D1 ' 
Cyclin D3 

Cyclin dependent kinase 4 
Cyclin G 
Cystatin C 

Cytochrome P450 1A1 

Cytochrome P450 2C11 

Cytochrome P450 2C23 

Cytochrome P450 2D18 

Cytochrome P450 2E1 

DNA polymerase beta 

DNA topoisomerase I 

Dynamin-1 (D100) 

Elongation factor- 1 alpha 

Endogenous retroviral sequence, 5' and 3' LTR 

Enolase alpha 

Epidermal growth factor 

Equilbrative nitrobenzylthioinosine-sensitive nucleoside 

Extracellular-signal-regulated kinase 1 

Fas antigen 

Fatty acid synthase 

Ferritin H-chain 

Focal adhesion kinase (pp125FAK) 

Gadd153 

Gadd45 

Gamma-actin, cytoplasmic 

Gap junction membrane channel protein beta 1 (Gjb1) 
Glucokinase 



Combo 2 Combo 2 Not Found 
Not Found Combo 1 Not Found 
Combo 1 Not Found Not Found 
Combo 1 Not Found Not Found 
Not Found Not Found Combo 1 
Combo 5 Combo 4 Not Found 
Not Found Not Found Combo 1 
Combo 2 Not Found Not Found 
Not Found Not Found Combo 1 
Combo 1 Not Found Not Found 
Not Found Combo 1 Not Found 
Combo 4 Combo 1 Not Found 
Combo 5 Combo 2 Not Found 
Not Found Combo 1 Combo 3 
Not Found Not Found Combo 4 
Not Found Not Found Combo 1 
Not Found Not Found Combo 1 
Not Found Not Found Combo 1 
Not Found Not Found Combo 2 
Combo 1 Not Found Not Found 
Combo 3 Not Found Not Found 
Not Found Combo 1 Combo 1 
Not Found Not Found Combo 2 
Combo 2 Not Found Not Found 
Not Found Not Found Combo 2 
Not Found Not Found Combo 1 
Not Found Not Found Combo 1 
Combo 1 Not Found Not Found 
Not Found Combo 1 Not Found 
Combo 2 Not Found Not Found 
Not Found Combo 3 Combo 1 
Combo 1 Combo 1 Not Found 
Not Found Combo 1 Not Found 
Not Found Combo 1 Not Found 
Not Found Combo 2 Not Found 
transporter Not Found Combo 2 Combo 1 
Not Found Combo 1 Not Found 
Not Found Combo 1 Not Found 
Not Found Not Found Combo 1 
Combo 2 Not Found Not Found 
Combo 3 Not Found Not Found 
Combo 5 Combo 1 Not Found 
Combo 5 Combo 4 Not Found 
Not Found Combo 5 Combo 4 
Not Found Not Found Combo 1 
Combo 3 Not Found Not Found 
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Glucose-regulated protein 78 
Glutathione S-transferase theta-1 
Glycine methyltransferase 
Heme oxygenase 
Hepatic lipase 

High affinity IgE receptor gamma chain (FcERIgamma) 
H-rev107 

Hypoxanthine-guanine phosphoribosyltransferase 
ID-1 

IgE binding protein 
IkB-a 

Insulin-like growth factor binding protein 1 
Insulin-like growth factor binding protein 3 
Insulin-like growth factor binding protein 5 
insulin-like growth factor I 
Insulin-like growth factor I, exon 6 
Integrin betal 

Interferon related developmental regulator IFRD1 (PC4) 
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Table 40 Liver Predictive Genes that are 
Most Predictive Across all Three Time 
Points 
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oxygenase Combo 5 Combo 2 Combo 3 
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TGGCGAATTGGGCCCTCTAGATGCATGCTCGAGCGGCCGCCAGTGTGATGGATATCTGCAGAATTCGCCCTTCGCG 
GGATCCCAAAGAAGCCACTGGAAGCCTTCACTGTGTGTCTCTATGCCCACGCTGATGTGAGCCGAAGCTTCAGCATC 
TTCTCTTACGCTACCAAGACGAGCT1TAAOGAGATTCTTCTG 

GGTGGGCCTGAAATACTGTTCAGTGCTTCAGAAATTCCTGAGGTACCAACACACATCTGTGCCACCTGGGAGTCTAC 

rACAGGAGTTGTAGAGCTTTGGCTTGACGGGAAACCCAGGGTGCGGAAAAGTCTGCAGAAGGGCTACATTGTGGGG 

ACAMTGCMGCATCATCTTGGGGCAGGAGCAGGACTCGTATGGCGGTGGCTTTGACGCGAATCAGTCTTTGGTGG 

GAGACATTGGAGATGTGMCATGTGGGACTTTGTGCTATCTCCAGAACAGATCAATGCAGTCTATGTTGGTAGGGTAT 

TCAGCCCCAATATnTGAACTGGCGGGCACTGAAGTACTCGAGGGCCAAGGGCGAATTCCAGCACACTGGCGGGCG 

TACTAATGGATCCGAGCTCGGACCAAGCTTGATGCATAACTTGA 
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AGTTATCAATATTGTTACTTGTAGCGGCCTGTTGTGCATGCCACCATGCTGCTGGACCCGGAGAGATTTGTTCTGAGT 

CTCTGGTGCATCATTTAATCTGTTAGGTTCTAGTGTTCTGTCTTGTTTTGTGTTACTCACAGCATTGTGCT 

CCAGCCGCAATGCTGTAGGCCCCAGGTTCCCTAGCAAGCTGCCAAACCAAAAGGGTCACCACCAGCTTAGCTGAGG 

CGTCCCAACCAGGCAGGACCCTTGAGGGCTGCTGTGTCCATGGTGATGGGGTGAAGTTTTGGCCAAAGGGCCAAAG 

GCTGGTGGATCCACACAGTCTGCCCTGTGACATGAATGGCTTTGAGGGGCTCTGGCTGGTGGTCAGGTTGGCi 1 1 1 

GTGTATTCTGGTTGACACACCATGGCAAGCTTGGCCAAGGGCGAATTCCAGCACACTGGCGGGCCGNTACTAAGTG 

GATCCGMCTTCGGTACCMGCTTGATGCATAGCTTTGAGTATTCTATAGTGNCACCTAAATAGCTTGGCGTAATCAT 

GGCATAACTTGTT 
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GGATCCCGCCCCAGTACCTTCCAAAGGTGTTGTTGCCCCTCGCAGGGTCACTGCATTTGGATCTGGGTCCTTCAGAA 
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AGCCTAGGCCTTGGCTAGAAGAGCAAGCGCCCGTAAACTGTTGCTTTGCTTCCTGCT 

CTTGAGGGTGCTGATGGTCATTTTAATTTATTGCTTTGAATACMCCGTAAGAGGGTACAGTG 

AAGTGGTGGTAACCCTGGCGGTTGCTCTTTCCCTCCCCTCTGCTACCGCTTTGTGGCCCAGGAGCCGCTACAGCCTG 
GGAGGGGGTCCTGCCTTCCTCTCCGTAGCCCTCCAGCTCATCrrCAGCGGGGAGGGTTTAATAGGGATGGATGCCC 
GTGGAGGTGACTGGACTATCCGGAGAGAGGGCGAGMGCTTGGCCAAAAGGGGCGAATTCNANCACACTGGCGGC 
CGTTACTAGT 


TTGGAATTGGGCCCTCTAGATGCATGCTCGAGCGGCCGCCAGTGTGATGGA I A I C I GCAGAA I ICGCCCI lUaUaU 

GATCCAGGATCTGATGCGCCAGTTTCTAAGCGGCCTAGATTTCCTTCATGCAAACTGCATTGTTCACCGGGACCTGAA 

GCCAGAGAACATTCTAGTGACAAGTAATGGGACAGTTAAGCTGGCCGACTTTGACCTAGCCAGAATCTACAGCTACC 

AGATGGCCCTCACGCCTGTGGTTGTTACGCTCTGGTACCGGGCTCCTGAAGTTCTTCTGCAGTCTACATATGCAACG 

CCTGTGGATATGTGGAGTGTTGGCTGTATCTTCGCAGAGATGTTTCGCCGGAAGCCTCTCTTCTGTGGGAACTCTGA 

GGCTGACCAGCTGGGCAAAATCTTTGATCTCATTGGATTGCCTCCAGAAGACGACTGGCCTCGAGAGGTCTCTCTTC 

CTCGAGGAGCCTTTTCCCCCAGAGGACCTCGGCCAGTGCAGTCAGTGGTGCCGGAGATGGAGGAATCTGGAGCGC 

AGTTGCTGCTGGAMTGCTGACCTTTMTCC^CTTMGCGMGCTTGGGCAAGGGCGMTTCCAACACACTGGCGGG 

CGGTACTAATGGATCCGAGCTCGNACCAACTTGATGCATAGCTTGAGTN 
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AATTGGCCCTCTAGATGCATGCTCGAGCGGCCGCCAGTGTGATGGATATCTGCAGAATTC6CCCTTATCGCGGGATC 

CTGCAGCAGCAGTTTAAAGAGCTCACAGCCCCTGATGAGAATGTACCAGCAAAGATTCTATCTTATAACCGTGCCAAT 

CGCGCTGTTGCAATTCTTTGTAACCACCAGAGGGCGCCACCAAAGACCTTTGAGAAGTCAATGATGAACTTGCAGTC 

TAAGATTGATGCCAAGAAAGATCAGCTAGCAGATGCTCGAAAGGACCTGAAAAGCGCTAAGGCTGATGCCAAAGTCA 

TGAAGGATGCAAAGACCAAGAAGGTAGTAGAGTCAAAAAAGAAGGCTGTACAGAGATTAGAAGAGCAGCTGATGAAG 

CTGGAGGTTCAAGCCACAGACCGAGAGGAGAACAAACAAATTGCC7TGGGGACCTCCAAACTCAATTACCTGGACCC 

TAGGATCACAGTGGCTTGGTGCAAAAAATGGGGGGTCCCAATCGAGAAGATTTACAACAAAACCCAGAGAGAGAAGT 

TTGCTTGGGCCATrGATAAGCTTGGCCAAAAGGGCGAATTCCAGCACACTGGCGGCCGhfTCTAGTGGATCCGAACT 

CGGTCCAAACTTGATGCATACTTGAGTATTCTATAA 


TATGACATG ATTACGAATTTAATACGACTCACTATAGGGAATTTGGCCC 1 UUAGGUUAAliAA 1 1 u^L»w\^VjrtUOV3rtA 

GTACATGCTGTCTGTGGATAATCTGAAGCTGCGTGATGTGGAGAAGGGCTTCATGTCAAGCAAGCATAI 1 1 1 IGCCCT 

CTTCAACACTGAGCAGAGGAATGTCTACAAGGATTACCGGCAGCTGGAACTGGCCTGTGAGACACAGGAGGAGGTG 

GACAGTTGGAAGGCTTCCTTTCTGAGGGCTGGCGTGTACCCTGAGCGTGTTGGGGACAAGGAGAAAGCCAGTGAGA 

CTGAGGAGMCGGCTCTGACAGCTTCATGCACTCCATGGACAGCTGTTCTGTGACTTGACAGTGGCTCCCCCGACCC 

CAAAGCGTCCTATTCATCATCTGTGACTTAGTCTGTTGTAGTGGTGAGCTGATACATCCAAATGTGATGTTGGTGAAA 

ACTTGTGCCCCTCTGTGGTATTGCTCCTGCGACATTCTATAAATATCTATAAATATACCTATATATACATATATATACAT 

ATATATGGCTGACACAGCCTCGCCTGTAGTGTCAGAATCAACCACCATGCTGTCCTTGTGGAGTCCTGTGGTCCAAC 

TACAANGGAACGCTGTCATCCTGACATCCCCCCTCCAATGTGCCACCTCCAGTGAGCCTCCCTGTCCCTTGGNCTGT 

GGACAGCCATCCCCTTGGCCATCCCCAC 


CACTATAGAATACTCAAGCTATGCATCAAGCTTGGNCCGAGCTCGGATCCAC I AG I ACCGGUUUUUAU lulvl 1 l^LiA 

ATTCGCCCTTTGAAGCTTTTGAGTGAAGCTCTGCCTGGGGACMTGTAGGCTTCAACGTAAAGMCGTGTCTGTCA^ 

GACGTTAGACGTGGCAATGTTGCTGGGGACAGCAAAAATGACCCACCAATGGAAGCAGCTGGCTTCACTGCTCAGG 

TGATTATCCTGAACCATCCAGGCCAGATCAGTGCTGGCTATGCCCCTGTTCTGGACTGCCACACGGCCCACATAGCA 

TGCAAGTTTGCCGAGCTTAMGAGAAGATCGATCGTCGTTCTGGTAAGAAGCTGGAAGATGGCCCCAAATTCTTGAA 

GTCTGGTGATGCTGCCATTGTTGACATGGTCCCTGGCAAGCCCATGTGTGTTGAAAGCTTCTCTGACTACCCTCCACT 

TGGTCGTTTTGCTGTTCGTGACATGAGGCAGACAGTTGCTGTGGGTGTCATCAAAGCCGTGGACAAGAAGGCTGCA 

GGAGCTGGCAAAGTCACCAAGTCTGCCCAGAAAGCTCAGAAGGCTAAATGAATATTATCCCTAACACCTGCCACCCC 

AGTCTTAATCAAAGGGCGAATTCTGCAGATATCCATCACACTGGCGGCCGCTCGAGCATGCATCTAGAGGGCCCAAT 

TCGC 


TTATGACATGATTACGAATTTAATACGACTCACTATAGGGAATTTGGCCCTCGAGGCUAAUAA I 1 U^W^^ttu 

TGACAGAAGCACATGGGTGGAGGAAAGACCTCTGCGACTGGCTGATTGATCCCTGCTGAAAGCCGAGGACCTTGTC 

CACAGACAGGAACAGTTCTCTTCATGAATGAAGGTCAGAGACAAGTGGGTGCTGCCATGGTGGACAACACAATGTCA 

TCTAGAATGGCTGAACCTCTACCCCCCCAGCACATCAGCGCCACAGATGCCTTTGCCATCTCTTGGATGCCCTGATA 

AAGCCAACAACTGTGAGTACTATTCATTGCCCAGGAAAACAGGAGGGAAGAGATTCAGTGACATGGGGCAGTGACAA 

AACAAATAAAGTGGCTCGGGAAATGGCTATACAGGAGCCTATCCTGGTTACAGGCCTGCAAGAGACAGCCACTGAGA 

ACTAGGATTAAACTMGGGGATGGCCTCACTTAGAAAAGGCCCAAGTTGTTTTAAAAGATAAAAAGACNATGACACAC 

TTGAGGGGAAGGCTATACTCCCCAGAAAACAAAGAAAAAGACTCACTTTGCCAAATACAGAAATGGACTCATTTAGGA 

GATAAAAANTTTGTCCCMGTAGTTAAGGTATATGTAATANACTTTAAAMTTITANCCCAAGAGAGACAG 


TAGNTG NC ATAATAAATACTC AAGTATGCATCAAAGTTTG GTACCGAGCTCGG A 1 CCAU 1 AG 1 ACOUbUOOUUOAO 1 

GTGCTGGAATTCGCCCTTATCGCGGGATCCGCTGCAGGCGAAAGTCCTGCAACTGCCTCCTGCTCAAAGTGAACCA 

GATTGGCTCCGTGACCGAGTCTCTGCAGGCGTGTAAGCTGGCCCAGTCCAATGGCTGGGGTGTCATGGTGTCCCAT 

CGATCTGGGGAGACTGAGGACACTTTCATTGCCGACCTGGTGGTGGGGCTCTGCACTGGGCAGATCAAGACTGGTG 

CCCCCTGCCGATCTGAGCGCCTGGCCAAGTACAATCAGATCCTTAGAATCGAGGAGGAGCTGGGCAGCAAAGCCAA 

GTTTGCCGGCAGGTCCTTCAGGAACCCCCTGGCCAAGTAAGGCATGGACCGGAGATCCCTGGAGCTACCAGATCCT 

CTGTCTCTGTCATCCAGGCGGCTCAAGGCTGGCCCAGTGCTTGCCCCTCCCATGTCACTGCTTCCTTAGATGTCCAC 

CCCGACCACCTGGAGCCCTGCTGGAGCCCCCAGCTTTGTAAAAGCTTGGCCAAAAGGGCGAATTCTGCAGATATCC 

ATCACACTGGCGGCCGCTCGAGCATGCATCTAGAGGGCCCAATTCGC 
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DNA topoisomerase I 


Dynamin-1 (D100) 


Elongation factor- 1 alpha 


Endogenous retroviral sequence, 5' and 3' LTR 
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gctggagggacaagaagcaaaggggtaggtaaaagagctcatggtgtcaactgcagacaagccaagttgtgaatcc 
tggtcagcacacccagagacttagtctagaaatccctccaggatgcctggatacctgtgctcccactgacctcagat 

GAGGGCCTGCTGTGGGACTGTGGTCCTTGGAAATCACTACCCTCTTGACGACCCAGGCACAACGGCATTACGTCATT 
CTGTTCTCATTCATATTGTTTGCTCATGGTCA^ 

GGAATGGGGGGAGCGTTGGGAGCAGAGTCCATGAACAATTTTGTCCCTCAGACTGTCTTCAI 1 1 \ IGGATGAGAGTG 
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ATCATGGCTCCGGGTrGGCCGCGGCCTCTGCCGCAGCTCCTCGTGTTGGGATTCGGGTTGGTGTTGATACGCGCCA 

CGGCCGGGGAGCAAGCACCAGGCAACGCCCCATGCTCAAGCGGCAGCTCCTGGAGCGCGGACCTCGACAAGTGCA 

TGGACTGCGCrTCTTGTCCAGCGCGACCACACAGCGACTTCTGCCTGGGATGCGCAGCAGCACCTCCTGCCCACTT 

CAGGATGCTATGGCCCATrCTGGGAGGCGCTCTTAGTCTGGCCCTGGTTTTGGCGCTGGTTTCTGGTTTCCTGGTCT 

GGAGACGATGCCGCCGGAGAGAAAAGTTTACTACCCCCATAGAGGAGACTGGTGGAGAAGCTGCCCAGGTGTGGC 

ACTGATCCAGTGAGGAGCACCCGCGCTGGTGCCCATTCATCGTCCATTCATTCATTCTGGAGCCAGCCTGGCTTTCC 

AGAGACAAGCCGCGCCAGACTCTTCCAACCACAAGGGGGTGGGGCGAGGTGGTGATTCACCTCCAAGGACTGGGC 

TTANGGTTCAGGGGANCCTTCCAGGGTGTCTAATTGCCCTGTCTCTGGlsrTCTGGGGCAGACAGANANCCTCAAGCTA 
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GAATTGGGCCCTCTAGATGCATGCTCGAGCGGCCGCCAGTGTGATGGATATCTGCA1GAATTCGCCCTTATC6CGGG 

ATCCCCAAGGGTCCTTGTACTCCCTTCTTCCTGATCACAGTGTGAAGAAATACTTTGACCAAGTGGATATCTCCAATG 

GTTTGGATTGGTCCCTGGACCATAAAATCTTCTACTACATTGACAGCCTGTC^TACACTGTGGATGCCTTTGACTATC 

ACCTGCCAACAGGACAGATTTCCAACCGCAGAACTGTTTACAAGATGGAAAAAGATGAACAAATCCCAGATAGAATGT 

GCATTGATGTTGAGGGGAAGCTTGGGGTGGCCTGTTACAATGGAGGAAGAGTAATTCGCCTAGATCCTGAGACAGG 

GAAAAGACTGCAAACTGTGAAGTTGCCTGTTGATAAAACAACTTCATGCTGCTTTGGAGGGAAGGATTACTCTGAAAT 

GTACGTGACATGTGCX^GGGATGGGATGAGCGCCGAAGGTCTTTTGAGGCAGCCTGATGCTGGTAACATTTTCAAGA 

TMCAGGTCnTGGGGTCAAAGGMTTGCTCCATACTCGGGCAMGGGCGMTTCCAGCACACTGGCGGCCGTTACTA 

GTGGATCCGAGCTCGGACCAAACTTGATGCATACTTGAGTAT^ 
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GGCGGCTCGGACTGAGCAGGGCTTTCCTTGCCAGTGGATTGTGTAGAGTGTACAGCCAGTCTCTTGTCTTCTGTCCA 

ACATGGCATCTTCTGATATTCAGGTGAAAGAGCTGGAGMGCGTGCTTCCGGCCAGGCTTTTGAGCTGATTCTCAGC 
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CCACAAAATGGAGGCTMCAMGAGMCCGGGAGGCGCAMTGGCTGCCAAGCTGGAGCGTTTGCGAGAGAAGGA 

CAAGCACGTTGAAGAGGTGCGGAAGAACAAAGAATCCAAAGACCCCGCGGACGAGACCGAGGCTGACTAAGTTGTT 

CCGAGAACTGACTTTCTCCCGACCCCTTCCTAAATATTCANAGACTGTACTGGNGCAG 
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timpoint are presented as mean ratio of 
treatment/control for alt 6 hour predictive 
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If 3} Individual animal number 
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(1 ) Gene expression data for 6 hour 
timpoint are presented as mean ratio of 
treatment/control for an 6 hour predictive 
aenes (Table 20). 


(2) Compound and dose abbreviations as 
in Table 1. 


(3) Individual animal number 


(4) Uver necrosis classification for 
compound-dose group at 72 h: yes, 
necrosis observed; no, no necrosis 
observed 


(5) Predictive gene (as in Table 20 and as 
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NADPH quinone oxidoreduclase-1 (DT- 
diaphorase) 


(1) Gene expression data for 24 hour 
timpoint are presented as mean ratio of 
treatment/control for all 24 hour predictive 


(2) Compound and dose abbreviations 
as in Table 1. 


(3) Individual animal number 

(4) Uvernecrosis classification for 
compound-dose group at 72 h: yes, 
necrosis observed; no, no necrosis 


(5) Predictive gene (as in Table 5 and as 
included in Table 32} 
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Table 36. Expression Data for 72 Houi 
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